Select Page

In the previous article, we explored some of the basic PyTorch concepts, like tensors and gradients. Also, we had a chance to implement simple linear regression using this framework and mentioned concepts. Now, we focus on the real purpose of PyTorch. Since it is mainly a deep learning framework, PyTorch provides a number of ways to create different types of neural networks. In this article, we create two types of neural networks for image classification.

Are you afraid that AI might take your job? Make sure you are the one who is building it.


Deep learning and neural networks are big buzzwords of the decade. Neural Networks are based on the elements of the biological nervous system and they try to imitate its behavior. They are composed of small processing units – neurons and weighted connections between them. The weight of the connection simulates a number of neurotransmitters transferred among neurons. Mathematically, we can define Neural Network as a sorted triple (N, C, w), where N is set of neurons, C is set {(i, j)|i, j ∈ N} whose elements are connections between neurons i and j, and w(i, j) is the weight of the connection between neurons i and j.

Usually, a neuron receives output from many other neurons as its input. The propagation function transforms them into consideration of the connecting weights to the so-called input network of that neuron. Often, this propagation function is just the sum of weighted inputs – weighted sum. Mathematically, it is defined like this: net = Σ (i*w), where the net is the input, i is a value of each individual input and w is a weight of the connection through which input value came. After, that input is processed the activation function. This function defines what will be displayed on the output of the neuron. Neurons are structured into layers, where every neuron of one layer is connected with all neurons from neighboring layers. Find out more about how neural networks function and learn here.


In this article, we implement neural networks for image classification of the Fashion MNIST dataset. This dataset is an “alternative version” of standard MNIST dataset which is often used as a “Hello world” example. In fact, the Fashion MNIST dataset has the same structure as the MNIST dataset, i.e. it has a training set of 60,000 samples and a testing set of 10,000 images of clothes images. There are 10 categories of clothes. The goal of the neural network is to learn these categories and being able to successfully categorize new images. All images in the dataset have been size-normalized and centered. The size of the images is fixed to 28×28, so the preprocessing image data is minimized. Here is what that looks like:

Pytorch Installation

Building Feed Forward Neural Networks

The code that is presented in this article can be found on Github as well.

Finally, let’s start with the PyTorch implementation of neural networks. First, we need to import all necessary modules:

You can see that we are pretty much only using PyTorch modules (except from NumPy and Matplotlib). Using nn module, we are able to create different neural network layers, and using nn.functional we can implement different activation functions. Apart from PyTorch libraries, we use some modules from torchvision library. Namely, we use the Fashion MNIST module, which contains FashionMNIST data. So, let’s load this data:

First, we create an object of the Fashion MNIST class, which essentially contains all the necessary data. We split this data into training and validation sets. Training data is used during the training process of supervised learning, which is a method neural networks use to learn from the data. Validation of data is also used during the training process to evaluate how well neural networks perform. Usually, we would create one test set too, for final evaluation of neural network performance on thus far unseen data, but for this simple tutorial, this is enough. Then we use DataLoader class to shuffle data and separate it into batches that are feed to neural networks during each training step. 

Decision Tree

Ok, to the fun stuff, let’s build a neural network with PyTorch. Here is the complete FFNN class:

Essentially, when you want to build some model using PyTorch you can inherit nn.Module class. This way you can just by overriding several methods create different types of neural networks. This is one of the main reasons why PyTorch is so popular in the research community because it gives you “precooked” solutions with enough flexibility. We utilize that to create a model that receives several parameters through the constructor. It receives input size (ie. number of neurons in the input layer), number hidden layers and their size, output size (ie. number of neurons in the output layer/number of categories), and the activation function that is going to be used in each layer.

In order for PyTorch to know that model has certain layers, you need to create a class attribute for each layer. That is why we create self.input_layer and self.output_layer attributes. Note that for hidden layers we use a different approach. We create an array of layers with nn.ModuleList(), since this is configurable through num_hidden_layers parameter. For every layer we use nn.Linear, which creates a simple layer with a defined number of neurons. Layers of this type perform simple y = wx + b function.

Apart from this, we need to override one important nn.Module method – forward. This function defines how the input will be processed in our neural network. This function basically connects all layers we defined in the constructor. Let’s examine it in more details:

First, we flatten the image, meaning we reshape it into an array. We do this because the input layer of our neural network can not receive 2D inputs. Then pass this information through each linear layer and apply the rectifier or ReLu activation function. This activation function is most commonly used for hidden layers since it gives the best results. It is defined with the formula relu(x) = max(0,x). Note that we don’t use ReLu after the output layer. This is because on the output we expect to get probabilities for each category. Apart from the forward function, there are various other methods we implement in order to better control the training of the network. Methods training_step and validation_step define what is done during every training and validation pass:

The training_step function takes the batch of images that are provided by the DataLoader and pushes them through the network to get the prediction. Underneath, PyTorch uses forward function for this. Once this is done, we detect how well the neural network performed by calculating loss. The different functions can be used to measure the difference between predicted data and real data. In this example, we use cross-entropy. Method validation_step looks similar, but this method also calculates the accuracy of our predictions using accuracy_function we passed through the constructor and stores loss and returns the dictionary with this information. Once the validation epoch ends we combine all these into an array, so we can see the history of the training process. Also at the end of every epoch, we print out information validation_step returned. Last two functionalities are implemented within validation_epoch_end and epoch_end methods:

In order to automate the training process of the neural networks, we implement one more class ModelTrainer:

This class has two methods fit and _evaluate. Method fit is used for training. It receives a model, number of epochs (number of times the whole dataset will be passed through the network), learning rate and data loaders. For each epoch, we get baches from the loader and run it through the network by calling the training_step method. Then we get the loss and use the backward method to calculate gradients. Finally, we use the optimizer to update the weights of the network.

Alright, those are the classes that describe the general neural network and general training process. We need to get more specific and utilize this class for our problem. To do so, we first need to implement accuracy function:

Also, we define helper function for ploting history:

Finally, we can put all these pieces together and create the object of FFNN. We create an neural network with 3 hidden layers and with 32 neurons in each hidden layer. Note that the input size is 28×28=784 and the output size is 10 since we have 10 categories of clothes:

Let’s train it and plot the history and accuracy:

Notice how loss is getting lower and the accuracy is getting better. In the end after only 5 epochs we reached accuracy of 83%.

Convolutional Neural Networks

If you want to process and classify images one of the best ways to do so is by using Convolutional Neural Networks. This type of network is in a way responsible for deep learning hype in the past couple of years. In the end, they use feed-forward neural networks, but they have a couple of tricks for image processing. At its core, we can find the convolution process. This process is used for making detecting features of the images and uses this information for classification. Here is how the complete architecture of Convolutional Neural Networks looks like:

First, convolution layers detect features (line, curve, etc) of the image using filters. They create so-called feature maps that contain information about where in the image certain feature is located. These maps are further compressed by the pooling layers after which are flattened into 1D array. Finally, a feed-forward network is used for classification, which is in this context called fully connected. PyTorch nn module provides a number of other layer trypes, apart from the Linear that we already used. Here is how we can implement the process described above:

The difference from FFNN are located in the constructor and the forward method. We know upfront which layers we want to use and we add two convolutional layers using Conv2d class and two fully connected layers using Linear class like before. In the forward function we use max_pool2d function to perform max pooling. Other methods are the same as for the FFNN implementation. We can utilize ModelTrainer that we already implemented befre and train this network:

We got a bit better results than with feed-forward neural networks. Accuracy is 88%. We can further improve these results by adding more convolutional layers, training networks longer, and modify the learning late. Give it a try.


In this article, we presented two implementations of neural networks for image classification using PyTorch. We explored some basic neural networks concepts and learned about Convolutional Neural Networks.


Thank you for reading!

Nikola M. Zivkovic

Nikola M. Zivkovic

CAIO at Rubik's Code

Nikola M. Zivkovic a CAIO at Rubik’s Code and the author of book “Deep Learning for Programmers“. He is loves knowledge sharing, and he is experienced speaker. You can find him speaking at meetups, conferences and as a guest lecturer at the University of Novi Sad.

Rubik’s Code is a boutique data science and software service company with more than 10 years of experience in Machine Learning, Artificial Intelligence & Software development. Check out the services we provide.