Neural Nets That Create: Generative AI
Given enough time, would 1000 monkeys pounding on keyboards compose a Shakespeare-quality work ? Replace monkeys with trained neural nets and you have the essence of Generative AI.
Generative AI has produced pseudo-Shakespeare, music, and clickbait. What follows is a mostly non-technical introduction for the curious, augmented by technical references and code for the adventurous.
What is different about Generative AI ?
Your generic "What is a Neural Network (NN)" picture looks like
The NN maps a very large, complex input (the 784-dimensional object represented by the 28 by 28 pixel image) to a much smaller, simpler range (labels describing the images, shown with associated probabilities).
A simple diagram illustrating the Generative AI process is
That is, the Generator transforms a simple input domain (random numbers) into a complex output (an image).
The mapping in both cases is "learned" through extensive training. The essential ingredient of training is the Loss Function: a measure of the distance between the network's output and the desired output (the correct label in case of the NN; a judgement on whether the output is "realistic" in case of the Generator). The goal of training is to reduce the Loss Function.
How it works. Part 1: Generative Adversarial Networks (GANs)
In implementing the Loss Function for the Generator, who is to judge whether the output is "realistic"? How about another NN?
We start by training an NN ("The Critic") to discriminate between valid (realistic) and corrupted images. That is, the training set is a collection of images labelled as valid or not.
Next, a second NN ("The Student") is trained to generate images. Given a random number as input, The Student creates an image (i.e., 784 pixels, little more than noise at first). The trained Critic evaluates the work as being/not being realistic and this judgement is incorporated into the Loss Function for The Student.
After a while, The Student starts to generate outputs that satisfy The Critic. So let's train a next generation Critic by feeding it examples of The Student's previously-accepted outputs and labelling them as corrupted. This makes the next generation Critic more discriminating. Train the next generation Student with this more discriminating Critic and The Student is forced to up its game.
The process repeats, alternating training of The Critic with The Student.
How it works. Part 2: Recurrent Neural Networks (RNNs)
A different form of Generative AI uses a generalization of the familiar "feed forward" Neural Network. A feed forward network is a collection of neurons whose data only flows "forward" (from layers closer to the input to layers closer to the output).
A Recurrent Neural Network (RNN) further allows connections that flow from the output of one layer back to its input; a "loop". You can imagine unrolling an RNN into an infinite multi-layer feed-forward network. This structure allows the RNN to have a kind of memory.
An RNN is trained with inputs that are sequences (e.g., of characters, notes). The Loss Function encodes an objective of producing element (N+1) of the sequence, conditioned on having seen the first N elements. That is, the goal is an accurate prediction of what comes next.
Imagine training the RNN with the sequence of characters that comprise the corpus of some author or speaker. After training, we feed an initial short sequence into the RNN and off it goes, extending the sequence ad infinitum. Incipient Shakespeare !
What is surprising is that, although trained on character sequences, the RNN learns words and structure (balanced parentheses, quotations marks, mark-up). That's the advantage of memory.
What does it all mean ?
Does Generative AI represent some form of intelligence ? I argue not: mimicking the output of a creative domain is not creativity. The real import is that it suggests that these domains have structure (think of the common elements or "signature" that help humans recognize a Shakespeare or Picasso). It is this structure that the machines are discovering. Getting the machines to reveal the inferred structure to us is a challenge (and maybe the subject of a follow up article.)
Explore
This introduction was necessarily high level and some (hopefully minimal) liberties were taken for the sake of simplicity. For those wanting to explore more deeply, references to articles and code follow:
- For those unfamiliar with training Neural Networks, the procedure is roughly this: Feed in a training example (input) and get an output. Initial output will be little better than noise because we've provided no direction for its creation. The Loss Function measures the "distance" between the actual and desired outputs. Nudge the parameters of the NN so as to reduce the Loss Function. Repeat; a lot !
- The random input to the Generator can be thought of as an index into the distribution of realistic images; it's just a way to discourage the Generator from producing a single, constant image.
- Generative Adversarial Networks (GANs) in 50 Lines of Code is a nice introduction to GANs, with code. For a more technical introduction, the seminal paper by Goodfellow et. al. is hard to beat.
- Really interesting use of GANs: two NN's try to learn to communicate privately (with the benefit of a shared key), while a third NN eavesdrops and tries to decrypt the message. A very accessible, but technical reference: Learning to Protect Communication with Adversarial Neural Cryptography. Here's an implementation: code
- Andrej Karpathy's influential blog post The Unreasonable Effectiveness of Recurrent Neural Networks is a great introduction to RNNs. Code included (Github char-rnn). Read the comments to see what this has inspired others to create !
- Manuel Aroaz shows how to generate music with RNNs: Training a Recurrent Neural Network to Compose Music, and is the source of the music example above.
- What happens if you train an RNN on Buzzfeed and Gawker ? Lars Eidnes teaches an RNN to generate clickbait: Auto-Generating Clickbait with Recurrent Neural Networks.
- Picture credit: derived from Chimpanzee seated at a typewriter.tif:, Public Domain, https://commons.wikimedia.org/w/index.php?curid=19075009