From GANs to Transformers - The Magic Behind Generative AI
Many of us have come across the term GPT in ChatGPT — but do you know what it actually stands for? It is Generative Pre-Trained Transformer.
So what exactly is a transformer, and what other types of models exist in the world of Generative AI?
Generative AI Models are a class of AI systems that learn from large datasets by recognizing patterns and trends. Instead of just analyzing data, they can create new content — images, music, text, or even video.
The design of these models depends on their purpose. Here are some of the most common and powerful types:
1. Variational Autoencoders (VAEs )
VAEs work by transforming input data through encoding and decoding. They have three main parts - an encoder network, a latent space, and a decoder network
The encoder takes the input data and turns it into a simpler form called the latent space representation. It holds the key features of the data. The decoder then uses this latent space representation to create new outputs.
Use cases - Image generation, anomaly detection.
Example - The Fashion MNIST VAE model can generate new images of clothing items (shirts, shoes, bags) by learning patterns from an existing dataset.
2. Generative Adversarial Networks ( GANs )
GANs involve two neural networks - the generator and the discriminator.
The generator creates new data samples, and the discriminator checks if the data is real or fake. Both networks train together.
The generator tries to make data that looks real, and the discriminator tries to tell the difference between real and fake data. This process continues until the generator becomes so good at producing realistic data that the discriminator can no longer tell the difference.
Recommended by LinkedIn
Use cases - Image synthesis, style transfer, data augmentation.
Example - Nvidia’s StyleGAN generates incredibly realistic faces, animals, and landscapes
3. Autoregressive models
Autoregressive models create data sequentially, considering the context of earlier generated elements.
These models predict the next element in the sequence based on the previous one. These models can generate sequences of data like text or music.
Use cases - Text generation, music composition, speech synthesis.
Example - WaveNet by DeepMind generates raw audio waveforms to create natural-sounding speech and even music.
4. Transformers
Transformers are generally used in natural language processing or NLP tasks.
They consist of encoder and decoder layers, enabling the model to effectively generate text sequences or perform cross-language translations.
Use cases - Text generation, translations, summarization, chatbots.
Example - Large Language Models like OpenAI’s GPT family and Google’s Gemini use transformers to create human-like responses, generate creative content, and even reason through problems.
👉 Stay tuned for the next edition - From Single Sense to Many: Unimodal vs. Multimodal AI