DCGANs: A journey in the latent space

DCGANs: A journey in the latent space

In the Deep Convolutional GANs, DCGANs, it was discussed that GANs have been known to be unstable to train, often resulting in generators that produce meaningless outputs. After extensive model exploration DCGANs identified a family of architectures that resulted in stable training across a range of different datasets and enabled of training higher resolution and deeper generative models, the main advances are achieved by:

  1. Using CNN that replaces deterministic spatial pooling (such as maxpooling) with strided convolutions, allowing the network to learn its own spatial sampling.
  2. Eliminating fully connected layers on top of convolutional features.
  3. Using Batch Normalization (BN) that stabilizes learning by normalizing the input to each unit to have zero mean and unit variance. This helps when having poor initialization and helps gradients to flow in deep networks. BN is applied to all layers except the generator output layer and the discriminator input layer.
  4. Using ReLU activation in generator for all layers except for the output, which uses Tanh.
  5. Using LeakyReLU activation in the discriminator for all layers..

Walking in the latent space:

In order to understand the latent space we can walk through it and see how the corresponding generated images change. For example, sharp transitions is usually a sign of model memorization. On the other hand, if walking in this latent space results in semantic changes to the generated images, we can reason that the model has learned relevant and interesting representations.

In the figure above, we can see that the space learned has smooth transitions, with every image in the space plausibly looking like a bedroom.

  • In the 6th row, you see a room without a window slowly transforming into a room with a giant window.
  • In the 10th row, you see what appears to be a TV slowly being transformed into a window

Vector Arithmetic:

As showed in DCGANs, simple vector arithmetic operations on latent space showed rich linear structure in representation space. Experiments working on only single samples per concept were unstable, but averaging the Z vector (latent vector) for three examples showed consistent and stable generations that semantically obeyed the arithmetic. This was the first demonstration of such thing occurring in purely unsupervised models.


Steps for vector arithmetic:

  • For each column, the Z vectors (latent vectors) of the three samples are averaged.
  • Arithmetic was then performed on the mean vectors creating a new vector Y .
  • The center sample on the right hand side is produce by feeding Y as input to the generator.
  • To demonstrate the interpolation capabilities of the generator, uniform noise sampled with scale 0.25 was added to Y to produce the 8 other samples around the central one.


  • Pose Transform in another trick used where a turn vector was created from four averaged samples of faces looking left vs looking right. By adding interpolations along this axis to random samples we were able to reliably transform their pose.

Moreover, It was proposed that one way for building good image representations is by training GANs, and later reuse parts of the generator and discriminator networks as feature extractors for supervised tasks.

Reference: Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).

To view or add a comment, sign in

More articles by Ibrahim Sobh - PhD

Others also viewed

Explore content categories