Introduction to Adversarial Autoencoders

Generative Adversarial Networks (GAN) shook up the deep learning world. When they first appeared in 2014, they proposed a new and fresh approach to modeling and gave a possibility for new neural network architectures to emerge. Since standard GAN architecture is composed from two neural networks, we can play around and use different approaches for those networks and thus create new and shiny architectures.

The idea is to make an appropriate model for your problem and generate data which can be used in a real-world business scenario. So far, we had a chance to see how to implement the standard GAN and Deep Convolutional GAN (combining CNN concepts with GAN concepts), but the zoo of GAN architectures grows on a daily basis. In the next few articles, we will try to cover some of those architectures.

The first topic on our list is Adversarial Autoencoder, the type of networks that combines Autoencoders with GAN. To be more precise, it seeks inspiration from the ideas behind Variational Autoencoders and merges them into GAN concepts. Before we dive into the details of this architecture, let’s take a little reminder on Autoencoders and learn how Variational Autoencoders do their thing.

Autoencoders

Autoencoders looks really similar to the standard feed-forward neural networks, but their main goal differs from them. Because of this fact, we can apply the same learning techniques, like backpropagation, on this type of network as well. However, just like Self-Organizing Maps and Restricted Boltzmann Machine, Autoencoders use concepts of unsupervised learning as well. This is possible because of their architecture which can be seen in the image below. To sum it up, they attempt to get copy input information to the output during the training. Their entire goal is to encode information about input in the middle of the architecture, and deconstruct it in the best way possible on the output.

The first segment of the Autoencoder, up until the middle of the architecture, is used for encoding information and it is usually called encoder. Mathematically we can write this down as – f(x). The hidden layer in the middle of the architecture is called the code or encoded vector, and it is the result of the encoding, which can be written down as – h = f(x). The last segment, from the middle to the output is called decoder and it reconstructs the information from the code, thus – y = g(h) = g(f(x)).

In essence, the Autoencoder receives the data on the input layer, propagates it to the middle layer, and then reconstructs the same data (ideally), on the output. This means that encoded data in the middle of the Autoencoder (the code) is actually the most important thing for us because it contains data with reduced dimensionality. Check this article if you want to learn more about Autoencoders, and here is where you can find how to implement them.

Variational Autoencoders

Variational Autoencoders (VAE) are generative models themselves. In one of the next articles, we will inject this architecture into GAN too, but in this article let’s just provide a brief introduction to them. The whole point of this type of Autoencoders is to generate random, new output that is highly similar to the training data. Sounds like a job of GAN’s Generator model, right? Another thing VAE is used for is to alter and explore different variations of the training data, but in a more supervised manner.

Standard Autoencoders have a big problem when it comes to generative context. Basically, they convert input data into the encoded vector that lies in a latent space which is not continuous. What does this mean? Well, because we don’t use VAE just to reconstruct the input, but to generate data that is similar to the input as well, we don’t need clusterization of data in latent space, (which will inevitably happen with standard Autoencoders). We need a continuous encoder output, so we could pick variations from it. As a solution for this problem, Variational Autoencoders creates a tad different encoded vector. Check the image above.

First, as VAE encodes input data into two parts, the mean value – μ and standard deviation – σ of the processed input data. Then they sample to create sampled encoding vector that is passed to the decoder. This can be explained like this: standard Autoencoder vector gives a vector that “points” to the encoded value in the latent space while VAE, and on the other hand, creates output that points to the “area” where the encoded value can be as it is presented on the image below. The mean value controls the point where the center of encoding is located, and the standard deviation defines the “area” in which encoding can vary from the mean.

Ideally, this approach should give us the encoded vectors that are as close to each other as possible. As a consequence, this allows a smooth interpolation and the generating of new samples. To make this ideal scenario as likely as possible, Variational Autoencoders utilize the Kullback–Leibler divergence (KL divergence). Here, we will close the VAE chapter for now, because this is the place where Adversarial Autoencoders take over.

Adversiaral Autoencoders Arhitecture

Adversarial Autoencoder has the same aim, but a different approach, meaning that this type of autoencoders aims for continuous encoded data just like VAE. However, it uses prior distribution to control encoder output. Encoded vector is still composed of the mean value and standard deviation, but now we use prior distribution to model it. Basically, we force the encoder output to follow a known distribution and make such encodings that we will not have any gaps in. This way the output will be evenly distributed over prior distribution.

We can use any type of distribution as the prior distribution, such as Gaussian distribution, gamma distribution, normal distribution etc. The important thing is that we push the distribution of the encoded values in the direction of prior distribution. As a consequence, the decoder learns only the mapping from the prior distribution to data distribution. Now, we use this modified Autoencoder as the Generator Model in our GAN network. Take a look at this image:

As you can see, we use encoded value as an input to the Discriminator model of GAN as well as to the decoder. Training data distribution is also input to the Discriminator Model. In the learning process, we first train the Generator model (encoder and decoder) – reconstruction phase. The aim is to minimize the reconstruction loss and get a good picture of the output of the decoder. After that, we proceed with the training of the discriminator, using both training data and the encoded values of the Autoencoder – regulation phase.

Mathematically, we can write down this entire architecture in the following manner. We can mark the input data with – x and encoded vector (the result of the encoder) with – z. We have several distributions that we need to address here. The prior distribution that we want to impose on z is defined like – p(z), while q(z|x) and p(x|z) are encoding and decoding distribution respectively. Then we have data distribution – which is marked with pd(x), and model distribution – p(x). Then we can write down the formula of posterior distribution – q(z) on the encoding vector like this:

This way Adversarial Autoencoder uses the adversarial process to guide encoded distribution, instead of using KL divergence like VAE is doing.

Conclusion

In this article, we were able to see how it is possible to use GAN’s two folded architecture to generate new architectures. In this case, we used Autoencoder (or its encoding part) to be the Generative model. Also, we learned the problems that we can have in latent space with Autoencoders for generative purposes. One solution was provided with Variational Autoencoders, but Adversarial Autoencoder provided a more flexible solution. In the next article, we will implement this architecture using Python.

Thank you for reading!

This article is a part of Artificial Neural Networks Series, which you can check out here.

Read more posts from the author at Rubik’s Code.