When they were first presented back in 2014., Generative Adversarial Networks (GAN) took the world of Deep Learning by storm. Their two folded architecture opened up the path to many creative solutions and combinations. Even Yann LeCun concluded that this is “the most interesting idea in the last 10 years in Machine Learning”. Since then, GAN zoo grew a lot. New architectures that harvest this adversarial premise are created on a regular basis. One of those solutions is Adversarial Autoencoders (AAE).
They introduced Autoencoder concepts into GAN architecture. To be more precise, they modified some of the ideas that Variational Autoencoders were already using to fit better in the structure of standard GAN. Basically, they stream encoded output (the code) as GAN’s Discriminator input. The code itself is not like in standard Autoencoders, but it consists of a mean value and standard deviation like in Variational Encoders.
However, instead of KL-divergence, they use the prior distribution to model their encoder output. In a nutshell, Adversarial Autoencoders force the encoder output to follow a known distribution. This way we get continuous data in latent space, the code is evenly distributed over prior distribution. You can find more details about all those information we just lay out there in the previous article. In this article, we will focus more on the implementation itself and the practical use of these structures. So, let’s start with prerequisites and the dataset.
Technologies, Dataset and Helpers
Now, that we had a small recap of how Adversarial Autoencoders work and let’s check out technologies and data that we will use in this example. The other thing we need to cover before we dive into the AAE code is the implementation of one helper class that is used for image manipulation. This example is implemented using Python 3.6.5 and TensorFlow 1.10.0.
The cool thing about TensorFlow 1.10.0 is that Keras is incorporated within it. So, Keras will be used as a high-level API. If we take into consideration that in recent announcements of TensorFlow 2.0 is implied that contrib module will be removed, this seems like a good choice. If you need help with TensorFlow installation follow this article.
For the dataset, we will use the one we used in previous experiments as well Fashion-MNIST dataset. Samples from this dataset are displayed in the image above. This dataset and it is much like the standard MNIST dataset that we used in some other articles as well. The difference is that instead of handwritten digits, this dataset contains images of clothes. Like standard MNIST dataset it is composed of 60,000 training images and 10,000 testing images. All images are 28×28 size-normalized and centered.
You might recognize image manipulation class form previous articles too – ImageHelper. Essentially it is a two function class. The first function is used to save generated image to the defined file location and it is called save_image. The second function – makegif creates .gif file from the images in the folder. Check out the code below:
Complete implementation of Adversarial Autoencoder is located in one Python class – AAE. You can check the code of whole class in the gist below:
There are several important points that we need to explain in more details. It is a lot of code, so we will split it into separate sections to explain them better. Of course, everything starts with the constructor, so let’s first inspect it:
Apart from initializing class’s properties for image helper and image shape, one more additional property is created. This is latent_dimension property and it is used to define the size of the encoder output. Constructor is also in charge of calling functions that will build encoder, decoder, discriminator models and merge them all together into one unified graph. So, let’s see what each of those functions is doing. We will start with _build_encoder_model function.
This code is very similar to the vanilla Autoencoder implementation that we could see here, but with one major difference. Basically, the first thing we do is flatten the image and let it pass through couple standard Dense layers. Then we create two outputs, one for mean value and the other for standard deviation again using Dense class. This is then put together into one output with the help of merge function. This process is represented with the image below:
In this case, inside of merge function, we used formula for Gaussian distribution. Basically, we are using this distribution as our prior distribution, ie. we are forcing encoders output to follow Gaussian distribution. This output is used as an input to both the discriminator model and decoder model. Let’s see how the decoder model is built in _build_decoder_generator_model:
As you can see Decoder has a double purpose. It is in charge of decoding data that was encoded with Encoder model and while it does that it is behaving as Generator as well. Implementation is fairly simple, we pass the information from the code through a few Dense layers and finally, we reshape it into the image. The last model that is built is the Discriminator model, and that is done in the _build_discriminator_model method.
It is a simple neural network, that is used for classification. At the end of it, a single neuron is telling is the image fake or real. Finally, let’s connect all these models into a single entity. For this _build_and_compile_aae is used:
In this method, all models are compiled as well. After this step, we can proceed with training our Adversarial Autoencoder. This class exposes a function for that – train function. Here is how it looks like:
Apart from running complete training process, this method takes a snapshot of generated images every 100 epochs. The training itself is done using small batches of images. This value is parametrized so you can experiment when the class is actually used. Firstly we train the discriminator on the batch because we want it to have a starting advantage over the generative process. This method stores loss of both sections of the AAE into history variable which later on can be plotted.
Now that we have a class that is doing all this, application of it to some dataset is pretty straightforward. We want our AAE model to generate images that are like the ones from Fashion MNIST dataset. Here is how it’s done:
Since it is necessary for the initialization of the AAE class, we first need to create an object of ImageHelper class and inject it into AAE constructor along with other desired parameters. After that, we can call train function on desired input data. To get better results, we scale input data to -1 to 1 range. Since we are familiar with the dataset, and the dataset is fairly simple, we have done it manually. Alternatively, this could’ve been done using Sci-Kit Learn library.
We did this implementation in order to get better results than we got during our experiments with standard GAN and DCGAN. So let’s see did we managed to do so. As expected starting iteration is terribly bad and Generator is basically creating only noise:
However, by the 1000th epoch, AAE starts generating meaningful and quite good images:
3000th epoch is looking even better:
It seems we got rather good results quite fast. Training process seems more stable as well. Let’s check did this trend continue at 10000th epoch:
Things got better, but not much better. Images generated in last epoch look like this:
In there we can see some good results, but some blur results as well. Here are all results in a .gif form:
We can even compare it to the results of the vanilla GAN and
In this article, we applied some of the theoretical knowledge we got in the previous article and implemented Adversarial Autoencoder architecture using Python and TensorFlow. We saw how we can manipulate data in latent space and generated pretty good images using AAE and stabilized the training process.
Thank you for reading!
This article is a part of Artificial Neural Networks Series, which you can check out here.
Read more posts from the author at Rubik’s Code.