If you are wondering what is so interesting about generative adversarial networks (GAN), please refer to the following link. In this article we dive further into the depths of GAN’s and understand how this technique works.

There have been a lot of improvements in generative adversarial networks (GAN’s) over time, but let’s go to the origin of it all in order to understand the concept.

## Intuition

Have you heard about the art forger Mark Landis?

It is quite an interesting story. He had been responsible for submitting forgeries to several art museums and got even better at making them over time. He often donated his counterfeits to these museums with doctored documents and even dressed as a priest to avoid suspicion. Leininger (curatorial department) was the first person to pursue Landis. You can read more about it here. But for the purpose of explaining this concept we need limited knowledge of this event.

Imagine that you are in-charge (Leininger) of identifying if the presented painting is fake or authentic. Further, Landis is also making his first forgery.

At first, you find it easy to identify a fake. However, over time both Landis and you get better. Landis develops more sophisticated skills, making it increasingly difficult for you to spot fakes.

### How it works?

To connect with the example in the previous section, consider the generator as Landis and discriminator as Leininger. However, here both the discriminator and generator are different neural networks which are both trying reduce their error.

The generator is trying to generate an output that fools the discriminator while the discriminator is trying to differentiate between actual and fake data. In other words, generative adversarial networks (GAN) is inspired by the zero-sum non-cooperative game where the generator is trying to maximize the number of times it fools the discriminator while the discriminator is trying to minimize the same.

These networks use back-propagation to reduce the error.

In other words, the generator and discriminator are two adversaries or opponents playing a game. They go back and forth against each other, improving their skill over time.

## Discriminator

• Classifies both real and fake data
• Updates weights based on the discriminator loss

## Generator

• Random noise can be generated from any distribution, we usually chose a distribution easy to sample from and having dimensions lower than the output.
• Transforms the random input into a more suitable form.

## Loss Function

The loss function consists of two parts:

> Generator's Loss + Discriminator's Loss
> Loss while identifying real data points + Loss from generated / fake data.

$\min_G \max_D V(D, G)= \mathbb{E}_{x\sim p_{data}(x)}[\log D(x)] + \mathbb{E}_{z\sim p_z(z)}[\log(1 – D(G(z)))]$

Let D be the discriminator and G the generator.

To start with, consider G(z), which is the output of the generator neural network for the noise input z. Note that we have randomly sampled the noise from a probability distribution. The generator based on the weights it has learned is able to transform the noise into hopefully something more meaningful.

D( G(z) ) is the discriminator using the output of the generator as an input.

In other words, at this point we are trying to find out the probability that a fake instance (forgery) is real.

$\mathbb{E}_{z\sim p_z(z)}[\log(1 – D(G(z)))]$

The above mentioned part can be summarized in the following points:

1. D ( G(z) ) provides the probability of the generator’s transformed output being classified as real.
2. Further, log( 1 – D ( G(z) ) ) is equivalent to – log ( D ( G(z) ) ).
3. We are trying to calculate the expectation of all the values calculated in the point above. At the start of the training, the generator will be producing output which is far apart from the ground truth and the discriminator will find it easy to identify the fake data points.
4. In other words, we are minimizing the loss for the generator that is E[ log( 1 – D ( G(z) ) )] or maximizing E[ log( D ( G(z) ) )]. Basically, as the generator gets better, the algorithm becomes stronger. That is if the generator creates better images closer to the ground truth the algorithm also starts looking for complex patterns to distinguish between an authentic and fake image.

The second part of the loss function is rather simple.

$\mathbb{E}_{x\sim p_{data}(x)}[\log D(x)]$

• The input here to the discriminator is the real data.
• We are trying to maximize this part that is we want the discriminator to be able to recognize the real images better.

## Training Process

• Training process consists of simultaneous SGD.
• On each step, two mini-batches are sampled, one from the real data and the other from generated / fake data.
• Two gradient steps are made simultaneously, that is optimizing the errors for the Generator and the discriminator.

In this article we have not explored certain concepts in too much detail such as:

• The different choices available for the loss function
• Finding the optimal Discriminator
• The divergence mechanism, in simplistic terms, it is the distance between two probabilistic distribution functions. In our case, we want the distance between the probability distribution of the generator (on fake data) to be close to the probability distribution of the discriminator (on real data).
• Label smoothing, Batch Normalization and other tips