Code that accompanies this article can be downloaded here.
Sometime in the last few weeks, while I was writing the explanations for the way in which neural networks learn and backpropagation algorithm, I realized how I never tried to implement these algorithms in one of the programming languages. Then it struck me that I’ve never tried to implement the whole Artificial Neural Network from scratch. I was always using some libraries that were hiding that implementation from me so I could focus on the mathematical model and the problem I was trying to solve. One thing led to another and the decision to implement my own Neural Network from scratch without using third-party libraries was made. Also, I decided to use object-oriented programming language I prefer – C#.
This means that a more OO approach was taken and not the usual scripting point of view like we would have by using Python and R. One very good article did that implementation that kind of way and I strongly recommend you to skim through it. What I wanted to do is to separate every component and every operation. What was initially just a thought exercise grew into quite a cool mini side-project. So, I decided to share it with the world. Before we dive into the code, I would like to emphasize that this is not really the way you would generally implement the network. More math and forms of matrix multiplication should be used to optimize this entire process.
Apart from that, the implemented network represents a simplified, most basic form of Neural Network. Nevertheless, this way one can see all the components and elements of one Artificial Neural Network and get more familiar with the concepts from previous articles.
Artificial Neural Network Structure
Before we dive into the code, let’s run through the structure of ANN. In general, Artificial Neural Networks are biologically motivated, meaning that they are trying to mimic the behavior of the real nervous systems. Just like the smallest building unit in the real nervous system is the neuron, the same is with artificial neural networks – the smallest building unit is artificial neuron. In a real nervous system, these neurons are connected to each other by synapsis, which gives this entire system enormous processing power, ability to learn and huge flexibility. Artificial neural networks apply the same principle.
By connecting artificial neurons they aim to create a similar system. They are grouping neurons into layers and then create connections among neurons from each layer. Also, by assigning weights to each connection, they are able to filter important from non-important connections. The structure of the artificial neuron is a mirroring structure of the real neuron, too. Since they can have multiple inputs, i.e. input connections, a special function that collects that data is used – input function. The function that is usually used as input function in neurons is the function that sums all weighted inputs that are active on input connections – weighted input function.
Another important part of each artificial neuron is activation function. This function defines whether this neuron will send any signal to its outputs and which value will be propagated to the outputs. Basically, this function receives value from the input function and according to this value it generates an output value and propagates them to the outputs. If you need more details on the architecture of the artificial neural network, you can find it here.
So, as you can see from the previous chapter there are a few important entities that we need to pay attention to and that we can abstract. They are neurons, connections, layer, and functions. In this solution, a separate class will implement each of these entities. Then, by putting it all together and adding backpropagation algorithm on top of it, we will have our implementation of this simple neural network.
As mentioned before, crucial parts of the neuron are input function and activation function. Let’s examine the input function. First I created an interface for this function so it can be easily changed in the neuron implementation later on:
These functions have only one method – CalculateInput, which receives a list of connections which are described in ISynapse interface. We will cover this abstraction later; so far all we need to know is that this interface represents connections among neurons. CalculateInput method needs to return some sort of value based on the data contained in the list of connections. Then, I did the concrete implementation of input function – weighted sum function.
This function sums weighted values on all connections that are passed in the list.
Taking the same approach as in input function implementation, the interface for activation functions is implemented first:
After that, concrete implementations can be done. The CalculateOutput method should return the output value of the neuron based on input value that it got from input function. I like to have options, so I’ve done all functions mentioned in one of the previous blog posts. Here is how the step function looks:
Pretty straightforward, isn’t it? A threshold value is defined during the construction of the object, and then the CalculateOutput returns 1 if the input value exceeds the threshold value, otherwise, it returns 0.
Other functions are easy as well. Here is the Sigmoid activation function implementation:
And here is Rectifier activation function implementation:
So far so good – we have implementations for input and activation function, and we can proceed to implement the trickier parts of the network – neurons and connections.
The workflow that a neuron should follow goes like this: Receive input values from one or more weighted input connections. Collect those values and pass them to the activation function, which calculates the output value of the neuron. Send those values to the outputs of the neuron. Based on that workflow abstraction of the neuron this is created:
Before we explain each property and method, let’s see the concrete implementation of a neuron, since that will make the way it works far clearer:
Each neuron has its unique identifier – Id. This property is used in backpropagation algorithm later. Another property that is added for backpropagation purposes is the PreviousPartialDerivate, but this will be examined in detail further on. A neuron has two lists, one for input connections – Inputs, and another one for output connections – Outputs. Also, it has two fields, one for each of the functions described in previous chapters. They are initialized through the constructor. This way, neurons with different input and activation functions can be created.
This class has some interesting methods, too. AddInputNeuron and AddOutputNeuron are used to create a connection among neurons. The first one adds input connection to some neuron and the second one adds output connection to some neuron. AddInputSynapse adds InputSynapse to the neuron, which is a special type of connection. These are special connections that are used just for the input layer of the neuron, i.e. they are used only for adding input to the entirety of the system. This will be covered in more detail in the next chapter.
Last but not least, the CalculateOutput method is used to activate a chain reaction of output calculation. What will happen when this function is called? Well, this will call input function, which will request values from all input connections. In turn, these connections will request output values from input neurons of these connections, i.e. output values of neurons from the previous layer. This process will be done until input layer is reached and input values are propagated through the system.
Connections are abstracted trough ISynapse interface:
Every connection has its weight represented through the property of the same name. Additional property PreviousWeight is added and it is used during backpropagation of the error through the system. Update of the current weight and storing of the previous one is done in helper function UpdateWeight. There is another helper function – IsFromNeuron, which detects if a certain neuron is an input neuron to the connection. Of course, there is a method that gets an output value of the connection – GetOutput.
Here is the implementation of the connection:
Notice the fields _fromNeuron and _toNeuron, which define neurons that this synapse connects.
Apart from this implementation of the connection, there is another one that I’ve mentioned in the previous chapter about neurons. It is InputSynapse and it is used as an input to the system. The weight of these connections is always 1 and it is not updated during the training process. Here is the implementation of it:
Implementation of the neural layer is quite easy:
It contains the list of neurons used in that layer and the ConnectLayers method, which is used to glue two layers together.
Simple Artificial Neural Network
Now, let’s put all that together and add backpropagation to it. Take a look at the implementation of the Network itself:
This class contains a list of neural layers and a layer factory, a class that is used to create new layers. During construction of the object, initial input layer is added to the network. Other layers are added through the function AddLayer, which adds a passed layer on top of the current layer list. The GetOutput method will activate the output layer of the network, thus initiating a chain reaction through the network. Also, this class has a few helper methods such as PushExpectedValues, which is used to set desired values for the training set that will be passed during training, as well as PushInputValues, which is used to set certain input to the network.
The most important method of this class is the Train method. It receives the training set and the number of epochs. For each epoch, it runs the whole training set through the network as explained in this article. Then, the output is compared with desired output and functions HandleOutputLayer and HandleHiddenLayer are called. These functions implement backpropagation algorithm as described in this article.
Typical workflow can be seen in one of the tests implemented in the code on the repository – Train_RuningTraining_NetworkIsTrained. It goes something like this:
Firstly, a neural network object is created. In the constructor, it is defined that there will be three neurons in the input layer. After that, two layers are added using function AddLayer and layer factory. For each layer, the number of neurons and functions for each neuron are defined. After this part is completed, the expected outputs are defined and the Train function with input training set and the number of epochs is called.
This implementation of the neural network is far from optimal. You will notice plenty of nested for loops which certainly have bad performance. Also, in order to simplify this solution, some of the components of the neural network were not introduced in this first iteration of implementation, momentum and bias, for example. Nevertheless, it was not a goal to implement a network with high performance, but to analyze and display important elements and abstractions that each Artificial Neural Network have.
Thanks for reading!
Read more posts from the author at Rubik’s Code.