DCGANs: A journey in the latent space

Ibrahim Sobh - PhD

Published Jun 23, 2018

In the Deep Convolutional GANs, DCGANs, it was discussed that GANs have been known to be unstable to train, often resulting in generators that produce meaningless outputs. After extensive model exploration DCGANs identified a family of architectures that resulted in stable training across a range of different datasets and enabled of training higher resolution and deeper generative models, the main advances are achieved by:

Using CNN that replaces deterministic spatial pooling (such as maxpooling) with strided convolutions, allowing the network to learn its own spatial sampling.
Eliminating fully connected layers on top of convolutional features.
Using Batch Normalization (BN) that stabilizes learning by normalizing the input to each unit to have zero mean and unit variance. This helps when having poor initialization and helps gradients to flow in deep networks. BN is applied to all layers except the generator output layer and the discriminator input layer.
Using ReLU activation in generator for all layers except for the output, which uses Tanh.
Using LeakyReLU activation in the discriminator for all layers..

Walking in the latent space:

In order to understand the latent space we can walk through it and see how the corresponding generated images change. For example, sharp transitions is usually a sign of model memorization. On the other hand, if walking in this latent space results in semantic changes to the generated images, we can reason that the model has learned relevant and interesting representations.

In the figure above, we can see that the space learned has smooth transitions, with every image in the space plausibly looking like a bedroom.

In the 6th row, you see a room without a window slowly transforming into a room with a giant window.
In the 10th row, you see what appears to be a TV slowly being transformed into a window

Vector Arithmetic:

As showed in DCGANs, simple vector arithmetic operations on latent space showed rich linear structure in representation space. Experiments working on only single samples per concept were unstable, but averaging the Z vector (latent vector) for three examples showed consistent and stable generations that semantically obeyed the arithmetic. This was the first demonstration of such thing occurring in purely unsupervised models.

Steps for vector arithmetic:

For each column, the Z vectors (latent vectors) of the three samples are averaged.
Arithmetic was then performed on the mean vectors creating a new vector Y .
The center sample on the right hand side is produce by feeding Y as input to the generator.
To demonstrate the interpolation capabilities of the generator, uniform noise sampled with scale 0.25 was added to Y to produce the 8 other samples around the central one.

Pose Transform in another trick used where a turn vector was created from four averaged samples of faces looking left vs looking right. By adding interpolations along this axis to random samples we were able to reliably transform their pose.

Moreover, It was proposed that one way for building good image representations is by training GANs, and later reuse parts of the generator and discriminator networks as feature extractors for supervised tasks.

Reference: Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).

Eslam A. 7y

great one

1 Reaction

See more comments

To view or add a comment, sign in

DCGANs: A journey in the latent space

Ibrahim Sobh - PhD

Walking in the latent space:

Vector Arithmetic:

More articles by Ibrahim Sobh - PhD

Others also viewed

Run time pathfinder graph creation

Focusing on inherent structure of sensor data

Upper Confidence Bounds for Trees

The AI Shift in Rock Mechanics: From Black-Box Models to Physics-Informed Intelligence

Latest Advances in Vision Multimodal Models: A Comprehensive Overview

GraphRAG can strengthen knowledge fusion in ISR while increasing structural exposure

Spatial Patterns: Body Scale

Fractal Leadership

The Art and Science of Decoding Noisy Images

Origin of Graph Theory

Explore content categories

Walking in the latent space:

Vector Arithmetic:

More articles by Ibrahim Sobh - PhD

The Evolution and Applications of Attention Mechanisms in Deep Learning: A Comprehensive Survey

The Judicial Cognitive Process: From Case Inception to Judgment and the Promise of AI Augmentation

How to Learn Artificial Intelligence: A Beginner’s Guide

[𝑺𝒕𝒂𝒃𝒍𝒆] 𝒅𝒊𝒇𝒇𝒖𝒔𝒊𝒐𝒏 𝒎𝒐𝒅𝒆𝒍𝒔 explained with code 🤗

A conversation with ChatGPT about AI, study roadmap, applications, interview questions with answers, salaries, and more!

10 Object detectors with code [YOLOF, YOLOX, DETR, Deformable DETR, SparseR-CNN, VarifocalNet, PAA, SABL, ATSS, Double Heads]

FNet: Do we need the attention layer at all? [Explained with code]

Patches Are All You Need! [with code]

MLP is all you need! [with code]

9 Steps for solving any machine learning problem

Others also viewed

Run time pathfinder graph creation

Focusing on inherent structure of sensor data

Upper Confidence Bounds for Trees

The AI Shift in Rock Mechanics: From Black-Box Models to Physics-Informed Intelligence

Latest Advances in Vision Multimodal Models: A Comprehensive Overview

GraphRAG can strengthen knowledge fusion in ISR while increasing structural exposure

Spatial Patterns: Body Scale

Fractal Leadership

The Art and Science of Decoding Noisy Images

Origin of Graph Theory

Explore content categories