Advanced Optimization Methods: Artificial Neural Network Part 2

The previous was a brief overview of how to construct a neural network. This part we will go in depth and actually do a little math to create these networks. Just a quick reminder from part 1, our general network contains three layers, one input layer, one hidden layer, and one output layer. For this example I will try to maintain generality. We start with an input layer neuron.

Input layer neurons are the most simple (and the most boring ) neurons of the system. ILN do not accept any inputs, as they are the inputs. Their job is to simply pass the inputs to the hidden layer. There is exactly one input layer for any given NN (as far as I know). Each variable we are looking at gets its own input layer neuron. If we are looking at 3 variables, we would have 3 ILN, if we had 5 variables we would have 5 ILN, 100 variables,100 ILN and so on. They do not modify or weigh the inputs, nor do they apply the activation function before passing on the inputs.  The next layer is the hidden layer, which is much more interesting.

600px-ArtificialNeuronModel_english

Above is a schematic of our hidden layer neuron (HLN). The x values here are from our input layer (or previous hidden layer). These values are then weighed and summed. Giving us a new value. The summation is listed below.Screen Shot 2015-07-19 at 12.26.24 PM   This new z value is passed to our activation function. Last post we skipped over our activation function, however it deserves some attention now. The point of our activation function is to introduce non-linearity to our method. For this neural network I have chosen the sigmoid function. Why did I choose this  function? For one it is real-valued and differentiable.  Second it has an interesting property of its derivative. The derivative of the sigmoid function is equal to the sigmoid function times one mine the sigmoid function. This makes the derivative computationally easy. However, the sigmoid function is used most often for the non-linearity I mentioned before. Below is the function (from wikipedia) as well as the equation and derivative property.

Logistic-curve.svgScreen Shot 2015-07-19 at 1.06.26 PM

Using this activation function we then take our z value, which is the summation of our newly weighed inputs, and put it through our simplified sigmoid function. The result of the activation function becomes the input for the next layer, which is our output layer. However, before we move on to an output neuron we have to go over how to determine how many hidden layers we need and how many neurons are in each of those layers. For most applications one layer would be appropriate. One thing to know is that increasing the number of layers give marginally better results.  There are a good amount of differing opinions on the amount of hidden layers needed, however there is even more confusion about the amount of hidden neurons needed in those layers. A quick heuristic is that the number of hidden neurons should be somewhere between the amount of input neurons and output neurons. However, this is just a rule of thumb and should not be taken as a hard and fast rule. Really the best way is to test our your network and “prune” out those neurons that aren’t contributing. Before we get to that we have to look at our last layer, the output layer.

An output layer neuron is just like a hidden layer neuron, however the output of the OLN is the answer or prediction we have been looking for. The output layer takes all the results of the activation functions from the previous layer and weighs them. It then sums all the weighted values and puts them through our sigmoid function again. After it passes through our function our method is complete.  Like the input layer, there is only one output layer. The number of neurons in this layer depends on the problem. If we are trying to predict a value only one neuron is needed, however, if you are trying to classify a data point you may need more than one neuron.

One big thing we over looked so far was why and how ANN work. Well we have to fine tune the weighs for each neuron (wij). By changing and optimizing these values we can make the network give us the information we want and need. This is called training our network, we will go over this in our next part.

-Marcello

Comments are closed.