AI, But Simple
Posts
Generative Adversarial Networks, Simplified

Generative Adversarial Networks, Simplified

AI, But Simple Issue #31

Edwin Dong
December 23, 2024

Hello from the AI, but simple team! If you enjoy our content, consider supporting us so we can keep doing what we do.

Our newsletter is no longer sustainable to run at no cost, so we’re relying on different measures to cover operational expenses. Thanks again for reading!

Generative Adversarial Networks, Simplified

AI, But Simple Issue #31

Imagine teaching two clever students who have very different talents to improve their skills by challenging each other while you let them figure things out on their own. That is the fundamental idea behind Generative Adversarial Networks (GANs), a generative model that has revolutionized how we think about generating data.

Most of the time, standard neural networks are used for making predictions. In these models, we feed in data, transform the data, and produce a predictive output that picks a class or value.

This kind of learning is called discriminative learning, as in, we’d like to be able to discriminate between a spam email and a non-spam email. Classifiers and regressors are both examples of discriminative learning.

Neural networks trained through backpropagation have transformed our ability to handle large, complex datasets, making discriminative learning extremely important in modern deep learning applications.

However, while discriminative learning has made huge strides, it has its limitations. Until about 10 years ago, generative tasks (those that involve creating new data, like generating “real-looking” images) remained an underdeveloped area, yet still needed for the world of deep learning.

But the success of deep neural networks in discriminative learning led to new possibilities. Over the past few years, researchers have started applying discriminative models in ways we wouldn't traditionally consider “supervised learning”.

In 2014, a breakthrough paper introduced Generative Adversarial Networks (GANs) (Goodfellow et al., 2014), a clever new way to leverage the power of discriminative models to create good generative models.

The core idea behind GANs is simple yet very powerful: a data generator is considered successful if it can produce data that is not distinguishable from real data.

This concept reflects a statistical technique called a two-sample test, which asks whether two datasets come from the same distribution.

However, the main difference between most statistics papers and GANs is that GANs use this idea in a constructive way.

In other words, rather than just training a model (called a discriminator) to say, “Hey, these two datasets do not look like they came from the same distribution,” they use the two-sample test to give training signals to a generative model.

This allows us to improve the generator until it generates something that resembles the real data. At the very least, it needs to fool the discriminator.

	Sponsored The AI PulseThe latest AI trends, tools, and tips in under 3 minutes.

Architecture

A GAN consists of 2 main components: a generator and a discriminator.

The generator network should generate data that looks alike or similar to the target data.

The discriminator attempts to distinguish generated (fake) and real data—this is the key that makes GANs so effective.

The discriminator is a binary classifier, typically using a fully-connected output layer with one neuron, applying the sigmoid activation function to obtain the predicted probability that an image is real or fake.

These two models (the generator and the discriminator) are trained together. The generator tries to fool the discriminator, and the discriminator tries not to be fooled.

They act like opponents in a game, constantly pushing each other to improve.

The generator might generate new fake data, while the discriminator network might adapt to the new fake data. This information is used to improve the generator network, which is used to improve the discriminator, and so on.

Eventually, if training goes well, the generator produces outputs so convincing that even the discriminator struggles to tell them apart from genuine data. At this point, we’ve made a good generator.

How Does a GAN Work?

The generator begins by taking random input or noise—this could be something like a vector of random numbers or a noisy image. Using this noisy input, it tries to generate a corresponding data sample.

The reason GANs start with random noise is because it gives the generator a blank canvas. Without any structured input, the generator has to learn everything from scratch about how to generate good data.

This randomness ensures that the generator can explore a wide range of possible outputs, allowing it to learn a variety of ways to produce realistic samples.

The generated output is then shown to the discriminator along with some real samples. The discriminator tries to correctly identify which samples are real and which samples are fake, measuring the identification accuracy using a loss function.

The generator’s loss measures how well it fooled the discriminator; the discriminator’s loss measures how accurately it identified real or fake samples.

Like standard neural networks, we then calculate gradients (slopes) that tell us how much each parameter contributed to the error.

We update both the generator’s and discriminator’s weights in the direction that reduces their respective losses (using an optimizer like Adam or SGD).

Over many iterations, these weight updates slowly push each model to improve.

Examples

GAN outputs over the years.

GANs are used for generative tasks such as:

Image Generation and Manipulation
- Generating photorealistic faces
- Generating animal photos
- Coloring grayscale images
Deepfakes and Video Manipulation
- Deepfake videos
- Filters
Synthetic Data Generation
- Use of fake data (not enough data or data privacy issues)
Speech and Audio Generation
- Voice cloning
- Music generation

The GAN’s Largest Problem

GANs are notorious for being hard to train: they involve two moving pieces (a generator and a discriminator) instead of just one.

GANs are a two-player game between the generator and discriminator. Sometimes, one model improves faster than the other, leading to an imbalance.

Techniques such as progressive training or balancing the learning rates of the generator and discriminator can be used to help deal with this problem.

There’s also another problem with GANs that has to do with the fact that the models’ parameters are sort of tied together.

Since the generator is directly dependent on feedback from the discriminator, their loss functions are interrelated.

The optimal loss curves would be a curve decreasing over time for the generator and a curve increasing over time for the discriminator, sort of “meeting in the middle” to not overfit.

The GAN’s evaluation metrics are also different from those of typical discriminative learning problems: they use specialized metrics that measure the quality and realism of the generated data.

Metrics such as Inception Score (IS) or Fréchet Inception Distance (FID) are commonly used to evaluate GAN-generated images.

These metrics measure how similar generated images are to real ones in terms of features, which is important in evaluating GANs.

Here’s a special thanks to our biggest supporters:

Sushant Waidande

If you enjoy our content, consider supporting us so we can keep doing what we do. Please share this with a friend!

Feedback, inquiries, advertising? Send us an email at [email protected].

If you like, you can also donate to our team to push out better newsletters every week!

That’s it for this week’s issue of AI, but simple. See you next week!

—AI, but simple team

Subscribe to keep reading

This content is free, but you must be subscribed to AI, But Simple to continue reading.

Already a subscriber?Sign In.Not now