Artificial Neural Network Explained


What is an Artificial Neural Network (ANN)?
An ANN is based on a collection of connected units or nodes called artificial neurons (analogous to biological neurons in an animal brain). Each connection (analogous to a synapse) between artificial neurons can transmit a signal from one to another. The artificial neuron that receives the signal can process it and then signal artificial neurons connected to it.

The neuron: basic building block of an ANN.

Dendrites: receiver of signals for the neuron.
Axon       : transmitter of signals from the neuron.

Inputs and Outputs:


Input Values:
1. They are independent to each other.
2. Needs to be standardized (mean 0 and variance 1).

Output Values:
1. Continuous
2. Binary
3. Categorical - several output values.

Note: All input values are given a weight and based on these weights signals pass through to the neuron. Weights are crucial to the ANN because this is how the ANN learn. By adjusting the weights, the ANN decides what signal is important and what is not.

What happens in the neuron?
Step 1: all weighted input values are summed up.
Step 2: an activation function is applied.
Step 3: result is passed to the output value.

Activation Functions:
Activation functions are an extremely important feature of the artificial neural networks. They basically decide whether a neuron should be activated or not. Whether the information that the neuron is receiving is relevant for the given information or should it be ignored.

Types:
1. Threshold function: It is a very simple function. If the value is less than 0, then threshold function passes a 0 and if the value is equal or greater than 1, then the threshold function passes a 1.
So basically it is a binary function.
2. Sigmoid function: It is a smooth and a gradual function. This function is very useful in the final output layer when we are trying to predict the probabilities.
Here, the values that are below 0 are pushed back above 0 and then it approximates towards 1.
3. Rectifier function (ReLU): This is one of the most widely used function in ANN's.
The ReLU function is non linear, which means we can easily backpropagate the errors and have multiple layers of neurons being activated by the ReLU function. The main advantage of using the ReLU function over other activation functions is that it does not activate all the neurons at the same time. What does this mean ? If you look at the ReLU function, if the input is negative it will convert it to zero and the neuron does not get activated. This means that at a time only a few neurons are activated making the network sparse,thus making it efficient and easy for computation.
4. Hyperbolic Tangent (tanh) function: this function is similar to the sigmoid, but in this case the function can go below 0. It can go from 0 to 1 or from 0 to -1.

Note:
If the response/dependent variable is binary, which activation function to choose?
1. Threshold function: because it gives values between 0 or 1.
2. Sigmoid function   : because it also gives values between 0 or 1. We can calculate the      probabilities of each class.

How do Neural Networks work?

Lets start from the top neuron in the hidden layer. All the input values are fitted to the neuron. Some inputs will have zero-values and some will have non-zero values. Now the neuron will accept only those inputs which have non-zero weights. 
This happens for all the neurons and only a few important features/inputs are finally fitted to the neuron and feature selection takes place based on weights.
say, out of 10 inputs/features, neuron 1 can select only 2 inputs, neuron 2 can select only 3 inputs, neuron 3 selected no inputs and so on.

Then all the values of the neurons are put together and finally we get an output value.

How do Neural Networks learn?
Here we have a very basic neural network with just 1 layer, called as perceptron.
Now, inputs are fitted to the neuron and an activation function is applied. Then we get an output. This predicted output (ŷ) is compared to the original value (y). 
Now a cost function, C is calculated. The cost function tells us that how much error we have in our prediction.
The main aim is to reduce the cost function, C as much as possible. Lower the cost function, closer the predicted value (ŷ) is to the original value (y).

Now, since we have compared the values, now we are going to feed this information back to 
the neural network and then the weights gets updated. This is called back-propagation.
Now, this above cost function is only for 1 row in our data. We apply the same row over and 
over again in number of iterations until the cost function decreases, updating weights after
each iteration.
What happens when we have multiple rows?
In this case, all the rows are fitted to the neuron one by one and the cost function is
calculated. Here, the final cost function is the summation of the cost functions of each 
individual row.
After we get the cost function, this information is back-propagated to the neuron and the 
weights are updated. 
In the 2nd iteration, the same process takes place and finally we get a new cost function 
which is again back-propagated. This goes on until the cost function gets reduced.


To learn about how to minimize Cost function, Click Here

Comments

Popular posts from this blog

A/B Testing: Bernoulli Model VS the Beta-Binomial Hierarchical Model

Exploratory Data Analysis and Hypothesis Testing - Loan Prediction

Recurrent Neural Networks and LSTM explained