The Fourth of Four Key AI Concepts, Joyfully Explained
My friendly friends, last time we made another pass at Concepts #2 and #3, global minimum and gradient descent. Today let's take another swipe at Concept #4, back propagation. Then we'll complete Section 1 of 5 with a summary of the four concepts, then we'll do the Hokey Pokey and we'll turn ourselves around (after all, that's what it's all about...)
Back Propagation is the Main Tool for Achieving Gradient Descent
Back propagation is the method we use to compute the gradient, and that gradient value then tells us how much to re-prioritize each weight (i.e., each relationship between feature questions) in the next iteration. Each point on the dotted line as the prediction-ball descends to the bottom of the bowl is one iteration of the neural network.
So, what is an iteration? That's a key question. Think of it as one try, or "trial," in our trial-and-error learning process of training a network to predict accurately. The first part of an iteration is our feed forward, which we saw above. It is the feed forward that sets our prediction-ball at its starting position at the top of the white, dotted line, from whence it rolls down the curvy surface of our red bowl towards the bottom, which sits on the white grid, where near-zero error lives (i.e., the global minimum). Each dot of that dotted white path represents a tweak, or update of the network's proposed combination of survey questions, with a proposed balance of emphasis that yields a slightly better prediction.
For your convenience, here's the curvy red bowl again for you to use in the next paragraph:
Here's the key thing to visualize: the path of our prediction-ball as it rolls down the inside surface of our curvy red bowl is erratic. You can see that the ball rolls over some bumps, rolls into some dips, changes direction suddenly, etc. To understand what's happening, let's start with our vertical axis first and imagine that the yellow arrow (which is our vertical, Z coordinate) moves with the ball and always remains under the ball as it rolls erratically down the side of the bowl. The yellow arrow therefore has many changes in length, because with each lurch the prediction-ball makes along the surface of the red bowl, the yellow arrow lurches right along underneath the ball. So, as the ball gets closer to the bottom of the bowl, the yellow arrow gets shorter and shorter until it is close to zero, where there is almost zero difference between our prediction and The Actual Truth. That means our prediction is accurate. So when the yellow arrow equals zero, our horizontal coordinates on the X and Y axes must be right under the bottom of the bowl (aka the global minimum).
The diagram at this link may also be helpful in envisioning the geometry of neural networks. It's essentially another version of the same, red bowl above, but from a slightly different angle.
Let's conclude Section 1, The Big Picture with a summary:
Gradient Descent is the overall process of a network learning by trial-and-error until its predictions are accurate (i.e., with minimum error). It is like the pink ping pong ball rolling down the side of the red, curvy bowl towards the bottom, the perfect Global Minimum. The network is like a pink prediction-ball, and the surface of the red bowl is made up of every prediction the network could possibly make. Gradient descent is like a prediction-bowl rolling down the surface of the "bowl of predictions" to the bottom of the bowl where the overall error in prediction is at a minimum (the Global Minimum).
Feed Forward reminds me of an old-fashioned 1960's IBM computer that fills a room, with punch cards being fed into one end and then a fabulous prediction spitting out the other end. The network takes the data from the survey's three questions and "feeds it through the computer" to arrive at a prediction. A prediction is like a freeze-frame photo of where the ball is located in the bowl at a given moment;
Global Minimum: Again, picture the red bowl sitting on a white table. The place where the bowl meets the table's surface represents a near-perfect prediction with minimal error. Compared to the entire surface of the "bowl of error," (Think, "the global surface"), the bottom is closest to perfection. It has the "global minimum" of error.
Each time the network makes a better prediction, the pink prediction-ball rolls down the bowl's sides and approaches that global minimum of error at the bottom. After each prediction, the network compares that prediction to survey question four, The Actual Truth. This is like measuring how far the prediction-ball is from the bottom of the bowl at a given moment. The measure of this distance between the prediction and The Truth is called finding the error. The network's goal with each prediction is to constantly reduce that error to a global minimum.
Back Propagation: Picture a circus juggler who can juggle 16 bowling pins of different sizes and weights. He is able to keep them all in the air at the same time, even as he (magically) adjusts their size and weight on the fly. After making a prediction, the network then works backwards through its previous prediction process to find out what went wrong and fix it. The network is asking the question, "What adjustments would lessen the error in the next prediction, thereby moving the ball down the bowl to the global minimum?"
My goal so far was to give you a general understanding of how our neural network can train itself on past customer surveys from the pet shop. Next, we're going to take a look under the hood and learn the code that makes our network learn by trial-and-error. Hopefully, now that you can see what a neural network does in 3D, it will make it easier to understand why we do all the following abstract steps with math and with code.
Again: if you are still confused by the above diagrams and analogies, please don't worry at all. This is not a novel, and you are going to read it more than once. With each step up the "upward spiral" staircase of knowledge, you will gain more-and-more insight. Godspeed, keep breathing deeply, and do not beat yourself up!
You have completed Section 1 of 5! Let us celebrate together with an uplifting photo of a clerk in a Kyoto tea shop. She obviously takes great pride in her work, can you say the same about your diligent studies of AI? Well, can ya?