Network as Generator

Compared to a traditional neural network, a generator network takes a simple distribution as input and outputs a distribution.

Why Output a Distribution?

When the same input may correspond to different outputs, the network should output a distribution; otherwise, the predictions may be inaccurate. Here is an example:

Generative Adversarial Networks (GANs)

Anime Face Generation

Unconditional generation: The generator's input does not have a specific .

Discriminator

The Discriminator is a neural network that outputs a scalar value for each input, with higher values indicating real samples and lower values indicating fake ones. The architecture of the discriminator can be customized, but its goal is to produce a scalar that estimates the quality of generated images.

Basic Idea of GAN

The generator improves over time by learning from the discriminator’s feedback, and the discriminator’s standards become stricter as the generator improves. This adversarial process is the core of GANs.

GAN Algorithm

The discriminator learns to classify real versus generated objects.
The generator learns to "fool" the discriminator.
- Fix the discriminator while updating the generator .

Theory Behind GAN

Let be the distribution generated by and be the real data distribution. The objective is to find:

The challenge is computing this divergence:

Although we don’t know the full distributions of and , we can sample from them. These samples allow us to estimate the divergence.

For the discriminator, it should assign high scores to samples from and low scores to samples from . The objective function here is equivalent to minimizing cross-entropy, similar to a binary classification task.

Small divergence suggests it’s hard to discriminate between real and generated samples, while large divergence indicates a better discriminator. Our task is to minimize the divergence , but it is difficult to compute directly. However, is related to the divergence we want to compute, so we can use as a replacement.

Tips for GANs

Problems of JS Divergence

In most cases, and have minimal overlap, as both are low-dimensional manifolds in high-dimensional space. Overlapping is therefore negligible.

Moreover, since we only sample from these distributions, they are unlikely to overlap even if the actual distributions do. With limited samples, JS divergence typically yields when distributions don't overlap. This can obscure meaningful progress in training since it only provides insight when the distributions overlap.

In practice, this means the discriminator achieves 100% accuracy due to the lack of overlap, which doesn’t reflect improvements in the generator. Thus, JS divergence is not a reliable measure during GAN training.

Wasserstein Distance

To improve upon JS divergence, consider Wasserstein distance. Think of one distribution as a pile of earth and the other as a target. The Wasserstein distance is the average distance that "earth" needs to move to match the target distribution.

Among all possible "moving plans," we use the one with the smallest average distance to define the Wasserstein distance, which decreases as the generator improves.

Wasserstein Distance and WGAN

We evaluate the Wasserstein distance by:

The function must be 1-Lipschitz (smooth enough), or else the training will not converge.

GAN for Sequence Generation

One major challenge for GANs is generating sequences. Changes in the decoder's parameters may not always produce different outputs, leading to non-differentiable results and unchanged scores from the discriminator.

Evaluation of Generation

Quality of Generated Images

The task is to automatically evaluate the quality of generated images. One method is to use a pre-trained image classifier that takes an image as input and outputs .

Diversity

One key issue in GANs is Mode Collapse, where the generator produces high-quality images but only a few distinct types. It initially appears effective, but over time it becomes clear that the generator only outputs a limited set of images.

Another problem is Mode Dropping, where the generator cannot produce images outside of the training set. For example, it may only alter skin color over multiple iterations without creating new faces.

High quality and diversity yield a large Inception Score (IS). However, there’s often a trade-off between quality and diversity. To evaluate diversity, we assess multiple images, while quality is judged on a single image.

Frechet Inception Distance (FID)

Rather than using final outputs, we calculate FID based on the vector before the softmax layer.

Conditional GAN

Conditional GAN: Given a condition and noise distribution , produce .

Example:

The discriminator should take both and the generated image as inputs. If it only receives , the generator may learn to ignore the input conditions and simply produce realistic images.

This requires paired data so the discriminator can judge both the realism of and the match between and .

Learning from Unpaired Data

For tasks like image style transfer, GANs can help learn mappings without paired data.

Cycle GAN

Cycle GANs aim to ensure the generator’s output relates to its input.

In a Cycle GAN, one generator transforms to , while the other maps back to . The final output should closely match the original input, forcing the output to be related to the original input.