The code that accompanies this article can be downloaded here.
A couple of days ago news about AI that could detect shoplifters even before they commit the crime surfaced on the web. Not long after that, we could read about the GAN network that can create photorealistic images from simple sketches. Even though, this news left me amazed I was hardly surprised. Machine learning and deep learning are dominating image classification and segmentation field, and engineers are coming up with more and more interesting solutions. From Facebook tag suggestions to self-driving cars neural networks really took over this world.
In fact, behind all these successes lay concept of Convolutional Neural Networks, that we explained in this article. This type of neural networks was created back in the 1990s by Yann LeCun, today’s director of AI research at Facebook. Similar to other ideas in the field, this one also has roots in biology. Researchers detected that individual neurons from visual cortex respond to stimuli only in a restricted region of the visual field known as the receptive field.
Because these fields of different neurons overlap, together they make the entire visual field. This effectively means that certain neurons are activated only if there is a certain attribute in the visual field, for example, horizontal edge. So, different neurons will be “fired up” if there is a horizontal edge in your visual field, and different neurons will be activated if there is, let’s say a vertical edge in our visual field.
For example, take a look at this image, and tell us what do you see:
This is a well known optical illusion, which first appeared in a German humor magazine back in 1892. As you could notice, you can see either duck or rabbit, depending on how you observe the image. What is happening in this and similar illusions is that they use previously mentioned functions of the visual cortex to confuse us. Take a look at the same image bellow:
If your attention wanders to the area where the red square is you would say that you see a duck. However, if you are focused on the area of the image marked with a blue square, you would say that you see a rabbit. Meaning, when you observe certain features of the image different group of neurons got activated and you classify this image either like a rabbit or like a duck. This is exactly the functionality that Convolutional Neural Networks utilize. They detect features on the image and then classify them based on that inforamtion.
We will not go into details of how
In one of the previous articles, we implemented this type of neural networks using Python and Keras. We created a neural network that is able to detect and classify handwritten digits. For that purpose, we used MNIST dataset. This is a well-known dataset in the world of neural networks. It is extending its predecessor NIST and it has a training set of 60,000 samples and testing set of 10,000 images of handwritten digits. All digits have been size-normalized and centered. Size of the images is also fixed to 28×28 pixels. This is why this dataset is so popular.
Using Convolutional Neural Networks we can get almost human results. The record, when it comes to accuracy of prediction on this dataset, is held by the Parallel Computing Center (Khmelnitskiy, Ukraine). They used an ensemble of only 5 convolution neural networks and got the error rate of 0.21 percent. Basically, they are giving correct results in 99.79% of the cases. Awesome, isn’t it?
> npm install http-server
It is run using command:
In order to faster load fore-mentioned data, guys from Google provided us with this sprite file and with this code so we can manage that sprite file. However, I had to tweak this code a little bit to better fit my needs, so here how it looks like:
Here are some major points of this MnistData class. Images and labels of those images are loaded into fields trainImages,
The whole code that accompanies this blog post can be found here.
Let’s start from index.html file of our solution. In the previous article, we presented several ways of installing TensorFlow.js. One of them was integrating it within script tag of the HTML file. That is what is done here:
Note that we add the script tag for TensorFlow.js and additional for
Now, let’s examine the script.js file, where the
You can notice that this function is similar to the one from the previous article. It reveals the workflow of the application. In the beginning, we load the data using
First, we create an object of MnistData class. This class is located in
In this function, we first create a new tab called Input Data. Then we get a batch of test data using
Once data is visualized, we can proceed to the more fun part of the
In order to better understand the layers used in this method, please refer to this article. Basically, we create two convolutional layers that are followed by max-polling layers. Finally, we flatten the data into an array and put it through a fully connected layer, which is in this case just one dense layer with 10 neurons. This last layer is actually the output layer, which predicts the class of the image. Model is then compiled with categorical cross-entropy and Adam optimizer. Once we print the summary of the model with
Cool! Now, we have prepared input data and model, so we can train it. This is done inside trainModel function:
In essence, we get a batch of train data and a batch of test data. Then we ignite fit method on our model and pass the train data for training and test data for evaluation. Metrics like loss and accuracy are displayed after each epoch using
The final step in this process if the evaluation of our model. For that purpose, we display accuracy per digit and we use the concept of a confusion matrix. This matrix is just a table that is often used to describe the performance of a classification model. This all is done in the
As you can see, this function is using helper functions like
displayAccuracyPerClass, displayConfusionMatrix and predict to make thise graphs:
We are getting pretty good results with this simple model and with just 20 epochs. We could improve these results by adding additional convolutional layers or increasing number of epochs. This would, of course, get an impact on the length of the training process.
In this article, we got a chance to see how we can utilize Convolutional Neural Networks with TensorFlow.js. We learned how to manipulate layers that are specific for this type of neural networks and run them inside of a browser.
Thank you for reading!
Read more posts from the author at Rubik’s Code.