The code that accompanies this article can be downloaded here.
In the previous article, we started exploring the vast universe of generative algorithms. We started with a gentle introduction to Generative Adversarial Networks or GANs. This major idea, first presented by Ian Goodfellow from the University of Montreal back in 2014, is still regarded as one of the biggest breakthroughs in the field. Facebook’s AI research director, Yann LeCun called this concept “the most interesting idea in the last 10 years in Machine Learning”. Today GANs are even used to generate paintings. Yes, as in art paintings and they are quite pricey.
Generative algorithms, the group of algorithms that GANs are belonging to, are a bit special. We might say that their main goal is completely different from the discriminative algorithms, that we explored in our series on artificial neural networks so far. While discriminative algorithms try to find label based on features, generative algorithms start from the label and generate the features. Mathematically, we can say that they utilize probability p(x|y), where y is given output label and x represents a set of features.
The underlining idea is to use two neural networks instead of one. The training and learning process stay the same and utilize standard techniques (like backpropagation). However, this time we train not one, but two models: a Generative Model (G) and a Discriminative Model (D). Generative Model captures data distribution and uses some sort of noise signal to generates samples and the Discriminative Model is trying to figure out did sample came from the generative model (is it fake) or from the training data (is it real). That looks like something like this:
In a way, we could say that these two models are actually competing against each other. The Generative Model will try to generate data similar to the one from the training set in order to “confuse” the Discriminative Model, while the Discriminative Model will try to improve and recognize is it presented with a fake data. Mathematically, this means they are playing
Technologies, Dataset and Helpers
Before we dive into the implementation of GAN and later
Samples from the dataset that is used can be viewed in the image above. It is called Fashion-MNIST dataset and it is similar to the standard MNIST dataset that we used in some of the previous articles. However, instead of handwritten digits, this dataset contains images of clothes. It has a training set of 60,000 samples and testing set of 10,000 images. Like in MNIST dataset, all 28×28 images have been size-normalized and centered. Finally, let’s check out the implementation of the ImageHelper class:
This class has two functions. The first one, save_image is used to save generated image to the defined file location. The epoch number is used to generate the name of the file. The second function,
The implementation of standard Generative Adversarial Network is done in the GAN class. Here it is:
That is a lot of code, so let’s describe it’s main parts. In the beginning, we import all necessary modules and classes. Keras classes and modules are especially important so we put them in a special section. The constructor of the GAN class is pretty simple and in an essence, it delegates construction of the Generative Model and the Discriminative Model to specialized functions. Apart from that, it initializes optimizer and as you can see Adam optimizer is used.
The fun is happening in the specialized functions: _build_generator_model, _build_and_compile_discriminator_model and _build_and_compile_gan. These functions require our special attention. The first two functions are using
We can see that standard layers were used. Firstly, Input class is used to create an input layer. Than Sequential class is used to describe the rest of the model. Dense class glues LeakyReLU and BatchNormalization layers together. Finally, the
Here we build a standard classification neural network with Dense and LeakyReLU classes. In the final layer, we have only one neuron. Basically, the output will tell us was real or fake image sent to the Discriminative Model. It is important to notice that trainable property of this model is set to false. This is done like this because at first, we will train only generator model. Another function, that we need to take look into is train function:
In this function, we first train the discriminator and after that, we train the generator model. We keep the loss in history variable and plot it once the training is done. Apart from that, we take a snapshot of generated images every 100 epochs.
Using GAN class is rather simple. All we have to do is create object of ImageHelper class first and inject it into GAN constructor along with other desired parameters. After that, we can simply call train function:
Just one note here. Input data is scaled to -1 to 1 range. This could’ve been done using Sci-Kit Learn or some other library, however since we are familiar with the dataset, we have done it manually.
At the beginning of the training, the Generator Model is very bad, and the only thing it generates is noise:
However, we can see that by the 1000th epoch we already generate more meaningful images. We can already see some contours:
By the 3000th epoch images are looking even better:
After this, however we can see stagnation of the results. Model is improving too slow from this moment on. Take a look at 10000th epoch:
Final epoch gives us these results:
We can see that final results are ok-ish. There are a lot mistakes in the images and the general feel is that this should be better. We will improve these results using
The other fun thing to observe is the loss. Take a look how it oscillates as each model gets better in time:
GANs biggest problem is that they are unstable to train (note the
In order to stabilize GANs training, authors of DCGAN proposed several improvements:
- Utilizing the convolution layer instead pooling function in the Discriminator model for reducing dimensionality. This way, the network itself will learn how to reduce dimensionality. On the other hand, in the Generator Model, we use deconvolution to upsample dimensions of feature maps.
- Adding in the batch normalization. This is used to increase the stability of a neural network. In an essence, batch normalization normalizes the output of a previous layer by subtracting the batch mean and dividing by the batch standard deviation.
- Remove fully connected layers from Convolutional Neural Network.
- Use Relu and Leaky Relu activation functions.
The implementation of DCGAN is done in DCGAN class. The structure of the class is pretty much the same as of GAN class. The only difference are the layers that we use for building our models. Insed of standard layers, like Dense we used convolutional layers, like Conv2D and UpSampling2D. Take a look:
Because we were able to keep the same API, the usage of this class is the same as well. The only difference is that training is done in 20000 epochs, not in 30000 like we did for GAN. The reason for that is that training fro
Here we expect better results, and indeed we got them. In the beginning, we had just noise, just like with GAN:
By the 1000th epoch, we got something more concrete. The result is better than GAN’s 1000th epoch as well:
The trend continued in 3000th epoch:
And in 10000th epoch:
Finally, in the 20000th epoch we got something like this:
In this article we had a chance to go deeper into the GAN and DCGAN structure. We had a chance to use theoretical knowledge from the previous article and implement these architectures using Python and TensorFlow. In the end, we managed to generate pretty good images using DCGAN, but we definitely can do better. In next article, we will try to improve these results even more using some more advanced GAN architectures.
Thank you for reading!
This article is a part of Artificial Neural Networks Series, which you can check out here.
Read more posts from the author at Rubik’s Code.