In one of the previous articles, we started our journey into the world of Autoencoders. We saw that they are one special kind of neural networks, that was able to utilize techniques of supervised learning for unsupervised learning. One might say that they are doing some sort of self-supervised learning. How so? Well in their essence they are feedforward neural networks that use the same concepts that all other neural network use like neurons and weighted connections.
The trick is that their output data should be the same as their input data. In fact, that is their main goal. This might sound a little bit confusing, so let’s observe the image below. There we can see one example of Autoencoder, or more precise, one example of Undercomplete Autoencoders. These symmetrical, hourglass-like architectures are having fewer neurons in the middle layer than in the input and output layer. In practice, we can have more than one hidden layers and than we have so-called Deep Autoencoders. The closer layers are to the middle of the network the fewer neurons they have.
The nature of the Autoencoders is to encode information, to compress it. In our example from the image above, the encoded information is localized in the middle layer, which is sometimes called the code. This is the most interesting information for us. The first part of Autoencoders is called encoding and can be represented with the function – f(x), where x is input information.
The code is the result of the encoding and can be represented like this – h = f(x). Finally, the part of the architecture after the middle layer is called decryption, and it produces the reconstruction of the data – y = g(h) = g(f(x)). To sum it up, the Autoencoder receives input data encodes it in the middle layer and then returns the same data on the output layer. Sometimes the middle layer can have more neurons than we have in input and output layers. Then we are dealing with Overcomplete Autoencoders. You can read more about other types of Autoencoders here.
In, this article we are going to explore three ways to implement the mentioned architecture. For this purpose, we will use Fashion MNIST dataset, which we will get more information in the next chapter. Apart from that, we will use Python 3.6.5 and TensorFlow 1.10.0.
Instead of using the standard MNIST dataset like in some previous articles in this article we will use Fashion-MNIST dataset. This dataset is having the same structure as MNIST dataset, ie. it has a training set of 60,000 samples and testing set of 10,000 images of clothes images. All digits have been size-normalized and centered. Size of the images is also fixed to 28×28, so preprocessing image data is minimized. Here is how that looks like:
There are 10 categories of clothes in this dataset, but that is not our interest here. We will try to create Autoencoder which will compress and de-compress these images. The goal of this article is to get more comfortable with Autoencoder architecture and see if they are any good at image compression. We will start with the implementation that uses Low-Level TensorFlow API.
Implementation Using Low-Level TensorFlow API
In these examples, we implement the Autoencoder which has three layers: the input layer, the output layer and one middle layer. Implementation of this Autoencoder functionality is located inside of Autoencoder class. Here is the implementation using low-level TensorFlow API:
Let’s examine this class for a little bit. As you can see there are four main parts we need to discuss here. Those are the constructor, train method, getEncodedImage method and getDecodedImage method. Through the constructor, we get dimensions of the input and the output layer, as well as dimensions of the middle layer. We define all variables that are necessary for our TensorFlow graph in constructor too:
Learning rate is hardcoded to 0.1, but if you want to you can pass this value as the constructor parameter as well. Here we define weight values for all connections and biases for the neurons in the middle layer and in the output layer. After that, the neural network is defined:
For the middle layer, we use a simple sigmoid function. Also, we define _real_output placeholder which holds expected output values (as you know in the case of Autoencoders that is the same value as input value). For the error function, mean squared function is used in combination with Adagrad Optimizer. In the end, TensorFlow session is created.
In the train method, this Autoencoder is trained. First, all global variables are initialized by running the _training operation within the defined session. Then data from the dataset is used to minimize the error:
Finally, let’s take a look into the getEncodedImage and getDecodedImage methods. They are pretty straightforward. In the first one, we get the result of the encoding process and in the second function, we obtain the result of the decoding process. Note that getDecodedImage method uses the original image as input. Here is how they look like:
Using this class is very simple, we just need to call the constructor, followed by train method. However, to use it on the Fashion-MNIST dataset, we need to modify the data a bit, because as you can see the input for the Autoencoder is defined as an array of data, while in the dataset we have 28×28 images. Here is the usage example:
The cool thing about this dataset is that it is part of Keras which is a part of the TensorFlow 1.10.0. Keras is a high-level API and it is no longer a separate library, which makes our lives a bit easier. So, we import data from this dataset and then reshape each image to an array. After that, we create an instance of Autoencoder. For the middle layer, we use 32 neurons, meaning we are compressing an image from 784 (28×28) bits to 32 bits. Finally, we train Autoencoder, get the decoded image and plot the results. Here’s what we get:
Implementation Using Keras
As mentioned, Keras is a part of TensorFlow library from the version 1.10.0. It is indicated that contrib module of TensorFlow will be removed in 2.0 version of this library and that all those use cases will be transferred to Keras. So, this is a good moment to get familiar with it. This shouldn’t be too big of a deal since Keras API id very user-friendly and easy to use. That is why we reimplemented Autoencoder class using this library:
The API of the Autoencoder class stayed pretty much the same. The only difference is that getDecodedImage method is receiving the encoded image as an input. Another thing you can notice is that code is much cleaner. From the layers module of Keras library, Dense and Input classes are used, and from the models module, the Model class is imported. The Model class is used to represent the neural network. We use it to create three models: _autoencoder_model, _encoder_model and _decoder_model.
The first one is the most important one and it is essentially our Autoencoder. The other two are just helpers and are used to get the encoded and decoded image in respective functions. Dense and Input classes represent layers of neural networks. The Input class is used for the input layer and Dense for all other layers. The best thing about the created model is that we can train it very easily, just by calling fit function and passing the input data to it. That is exactly what we do in the train method. More information about how to use Keras for neural network development can be found here.
The implemented Autoencoder class is used in the same way as the previous one:
However, the result of using this code is more or less the same as we used in the previous implementation, meaning not good. Here is what we get:
As you can see, the results are a little bit better, but still far from good. As it turns out, Autoencoders are not that good for image compression. In next and the final example we will try to use convolutional layers to achieve better results.
Implementation Using Conventional Layers
Let’s make our Autoencoder a bit more complicated. Since we are dealing with images, we can try to use the layers that are usually utilized by Convolutional Neural Networks. These layers are also available in contrib module of TensorFlow, but since this module will be deprecated soon, in this example we use the one provided by Keras. To be more exact, we use Conv2D, MaxPooling2D and UpSampling2D. To find out more about this layers you can check out this article. Apart from that, we are going to add more layers in the middle of the architecture. The third incarnation of Autoencoder class looks like this:
As you can see API is changed. The constructor doesn’t get the input and middle layer dimensions anymore. This is implicit since we know we are getting images that are 28×28. Even the input layer is now shaped as an image, it is no longer just an array of data. This means that we don’t need to reshape images from dataset anymore, at least not in the way we did for the previous implementations. Then three convolutional and max-pooling layers are used. This process will compress the image since we are using fewer and fewer feature detectors for each convolutional layer. In the _code_layer size of the image will be (4, 4, 8) i.e. 128-dimensional.
After that, the decoding section of the Autoencoder uses a sequence of convolutional and up-sampling layers. This way the image is reconstructed. Regarding the training of the Autoencoder, we use the same approach, meaning we pass the necessary information to fit method. Another thing that is important to note here is that we no longer have getEncodedImage function. This means that this class can be used like this:
Evidently, the flow is very similar. Data is now aligned with convolutional use, it is still reshaped but in a different way. Basically, only one channel is defined for the image, since they are only black and white. After that, the Autoencoder is trained and the results are plotted:
We played around with a lot of different concepts in this article. The goal was to get more comfortable with Autoencoders architecture and to use different approaches. What we can get from this experiment is that Autoencoders are not very good for image compression and that in this case using jpeg compression is probably a better idea. However, we saw that Autoencoders can be built in different ways.
Thank you for reading!
This article is a part of Artificial Neural Networks Series, which you can check out here.
Read more posts from the author at Rubik’s Code.