There are many variations of Generative Adversarial Networks. GAN Zoo actually became so big that just scrolling through all papers that are utilizing this concept can cause pain in your finger. All jokes aside GANs main concepts changed the world of deep learning. Their simple architecture that is consisting of two neural networks which are competing against each other, opened a completely new chapter in neural networks history.
Adversarial training, however, was not a new idea event at the moment of GAN’s emerging. It can be traced back to machine learning legend Arthur L. Samuel. His two main papers (Samuel 1959; Samuel 1967) are landmarks in Artificial Intelligence. In his 1959. paper, which explored computer checkers, he described the problem of an agent which is playing a game of chess against itself. This is a typical example of the Adversarial process. Ian Goodfellow, the inventor of GANs, defined the adversarial process as “Training a model in a worst-case scenario, with inputs chosen by an adversary”.
Check out this article if you want to learn how exactly GANs used this approach. So far in our GAN journey, we had a chance to explore and implement several architectures. Apart from standard GAN, we explored DCGAN and Adversarial Autoencoders. In the previous article, we got familiar with a special case in this niche Cycle GAN. These networks are not used for generating data but rather for transferring certain characteristics of images from one domain to the images of another domain. This problem is called Unpaired Image-to-Image Translation. Before we dive into implementation, let’s remind ourselves a little bit about the nature and structure of this type of neural networks.
Cycle GAN Architecture
In order to solve the problem of transferring style and characteristics of images from one domain to another and vice versa, we will create two sets of Generators and Discriminators. The first Generator (G) has the task to transform images from X domain to Y domain(G: X → Y), and the second Generator (F) has the task of transferring images from Y domain to X (F: Y → X). Their respective adversarial Discriminators are Dy and Dx.
Discriminator Dy pushes generator G to translate inputs from X into outputs that look like the images from Y. Second discriminator – Dx forces generator F to transform inputs from X into outputs from domain Y. Here is how the architecture looks like:
However, this process would be highly unstable if we left it just like this. Meaning that the process of mapping images from one domain to the other needs to be regularized. This is done using two-cycle consistency losses. These losses guarantee that image that is transferred from one domain to another, and back again will be the same(ish).
The first loss is called forward cycle-consistency loss (x → G(x) → F(G(x)) ≈ x), and the second one is called backward cycle-consistency loss (y → F(y) → G(F(y)) ≈ y ). Using this mechanism, Cycle GAN is actually pushing its generators to be consistent with each other. If you want to learn more about the theory and math behind Cycle GAN, check out this article.
Technologies, Dataset and Helpers
Now, that we had a small recap of how Cycle GAN work, so let’s find out technologies and data that we will use in this article. Apart from that, we will explore one helper class that is used for image manipulation. In this implementation, we are using Python 3.6.5 and TensorFlow 1.10.0 and Keras 2.1.6. If you need help with TensorFlow installation follow this article.
Regarding the dataset, we will use the one we used one of the datasets provided by authors of the architecture – monet2photo. In this dataset, we are having paintings of Monet and photos of landscapes. We will transfer Monet’s style to landscape photos and make Monet’s paintings more real, so to say. This and other datasets of this type can be downloaded from here.
In some of the previous articles, we used helper class for image manipulation. In this one, we use a similar class. It is more complicated than in previous examples, but it is still quite straight forward:
Here is the explanation of the functions provided by this class:
- save_image – This method saves images used during training. Original and translated images are passed to it and using them this function displays results.
- plot20 – Plots 20 images from the defined path.
- load_image – In essence, this method is just a wrap for scipy.misc.imread. Meaning, it loads the image in the memory from the predefined location.
- load_testing_image – This method loads random images from the test folder, one image per domain.
- load_batch_of_train_images – This method loads a batch of train images (from the train folder) from both domains.
The implementation of Cycle GAN is located inside of the Python class with the same name – CycleGAN. Note that this is one large class and that we will go through the important parts of implementation separately. Ready? Ok, here it is:
That is a lot of code, right? Let’s split it into smaller chunks and check out the most important parts. There are two main access points of this class – the constructor and the train method. Everything starts with the constructor where the whole model is created. So, let’s explore it first:
As you can see, the first two discriminators are made and compiled. This is done using _build_discriminator_model and _compile_discriminator_model method. Than two generator models are created with help of _build_generator_model method. Finally, all these graphs are connected together into architecture described in the previous article. Of course, inside of constructor class fields like cycle_lambda, _image_helper and optimizer are initialized. Now, let’s see those helper methods that we used to build our model. First, let’s explore _build_discriminator method:
Discriminator, in this case, is standard Convolutional Neural Network. Several layers of convolutional layers are used to detect features and based on that decide if the input image is coming from the desired domain. Simple as that. Building generator is a little bit complicated, so we will now go through three methods _encode__layer, _decode_transform_layer and _build_generator_model.
Essentially, our generator is created using so called encoding part, transformer part and decoder part. This can be visualized like this:
So, we need several encoding layers for down-sampling, several transformation layers for applying styles and several upsampling or decoding layers. These specific layers are created in functions _encode__layer and _decode_transform_layer. In the first one, encoding parts are built using convolutional layers and in second ones transformational and decoding layers are created using upsampling layers. All of these are connected inside of the _build_generator_model method.
In the end, all created generators and discriminators are connected using _build_and_compile_gan method:
Finally, let’s examine the only public method of this class – train method:
In this method, the model is utilized along with the image helper. First, we define some ground truth variables which are used during the training (real and fake). Once that is done we read a batch of images from the training folders and push them into generator models. This way we are able to get translated images. After that, we proceed with discriminators training. Once we trained discriminators, we train generators and repeat the whole thing for the defined number of epochs.
Because we extracted image handling into the separate class and complicated parts of model creation into the private methods, it is quite easy to use CycleGAN class:
First, we create ImageHelper instance which we inject into CycleGAN object. After that, we just run the training method. If you want to use any other dataset, all you have to do is to download it and rename the path that points to it. Note that we used 128×128 image size for processing, but you can experiment with any size you like.
So let’s see how our solution for Unpaired Image-to-Image Translation problem turned out. As expected, in the beginning, our results were catastrophic. In our first epoch we got these results:
However, already by the fifth epoch, we were able to see a lot of improvements:
When training reached 40th epoch, even on small size images like 128×128, we were able to see a lot of improvements:
Notice how sky in the first transformed image is having a more realistic feel to it and how you can note Monet’s fast brushes on the second transferred image. The results got even better in the epoch 70. Check out:
Once again, notice how we can see the desired change in both of the transformed images. Finally, here is what we got in the epoch 100:
In this article, we applied some of the theoretical and math knowledge we got in the previous article and implemented Cycle GAN architecture using Python, TensorFlow
Thank you for reading!
This article is a part of Artificial Neural Networks Series, which you can check out here.
Read more posts from the author at Rubik’s Code.