#27) Change Comes From Within (within the chain rule, that is...)

#27) Change Comes From Within (within the chain rule, that is...)

My beloved students, today could easily become the Greatest Day Of Your Life as we continue our sojourn in Section 5, and today we will explore how our AI code synchs up with the math of the Chain Rule. In a burst of artistic creativity, I have (cleverly) decided to entitle this chapter, "How the Code Synchs Up with the Math." I know, right? I came up with it all by my self:

5.5) How the Code Synchs Up with the Math:

No alt text provided for this image

5.5.a) Removing the Intermediary Variables

My dear students, you will notice with great joy that lines 66 to 115 of our Python code appear above, with (fashionable) red lines connecting them to our ratios of change. If those connections don't look consistent to you, that is only because our original code breaks the back propagation process down into several intermediary steps with several extra, intermediary variables. Below I want you to take a look at the code after I remove these four intermediary variables:

  1. l2_error;
  2. l2_delta;
  3. l1_error; and
  4. l1_delta

Start by studying the top pieces of code (with red arrows attached). Pretend the four intermediary variables have disappeared. What does that leave? Now take a look at the bottom line of the diagram below, with the green arrows pointing upward. You'll see that the bottom line of code with green arrows is what remains when we remove the intermediary variables from the red-arrowed chunks of code. All three rows now synch up perfectly. Below, I'll explain what I mean:

No alt text provided for this image

5.5.b) How the Code Calculates the Same Variable that Each Ratio of Change Calculates

Mi amigos, you may recall that, with confidence measures, when we took the slope of our statistical probability, that number between 0 and 1, we simply computed rise over run using the function of x (1 - x). x is the rise, 1-x is the run.

But here, we're computing something different--and more accurate. We want to know how much a given CHANGE in syn0,1 will cause a CHANGE in l2_error. It's like saying, "A change in the run, which is syn0,1, will cause how much of a change in the rise, which is l2_error?" So, in order to figure out the rate of change in d l2_error/d syn0,1 we are going to break that big ratio of change down into 5 little parts, 5 little ratios of change, 5 little cases of "this change in run causes this much change in rise." In other words, we're going to examine each link of the chain rule separately.

Let's walk through each of these ratios of change, from right-to-left, to make sure you understand how the code synchs up with the math:

For d l1_LH / d syn0,1: We know that l0,1 is 1 (the "yes" answer of customer one to, "Do you own a cat?"). Therefore, the ratio of change will always be 1, because no matter what value you make syn0,1 to be, l1_LH will always be that value times l0,1, which is one. This makes sense because l1_LH divided by syn0 will always equal l0.

For d l1 / d l1_LH: In the top code, when you remove the intermediary variables l1_delta and l1_error, you are left with only finding the slope of l1. This is exactly the same as when we took the confidence measure of l1 in Section 4.5 of 5. So, d l1 divided by d l1_LH will always equal the slope of l1.

For d l2_LH / d l1: In the top code, when you remove the intermediary variables l1_error and l2_delta, you are left with only syn1,1. This makes sense, because d l2_LH divided by d l1 will always equal syn1.

For d l2 / d l2_LH: In the top code, when you remove the intermediary variables l2_delta and l2_error, you are left with only the slope of l2. This makes sense, because d l2 divided by d l2_LH will always equal the slope of l2.

For d l2_error / d l2: The "-1" in the bottom green code makes sense because the relationship is a negative correlation. For example, take y - l2 = l2_error, and for our first customer 1 - 0.5 = 0.5. If you increase l2 by 0.1, then 1 - 0.6 = 0.4. In other words, l2_error decreased by 0.1 when we increased l2 by 0.1. That's a 1-to-1 negative correlation.

It's key that you understand that all the slopes we are calculating above are being evaluated at the CURRENT STATE OF THE NETWORK (i.e. with weights fixed at the values used in feed forward). To use our juggler's analogy, it's like he can magically stop time for a moment and take a snapshot of all 16 pins in the air at that moment. This is like our prediction at the end of one forward feed. Since one bowling pin has changed in size, he now magically adjusts the sizes of the other 15 pins, still in mid-air, before he magically starts time moving again (i.e., the next iteration).

I hope you are beginning to see the amazing power of the chain rule to juggle all the weights of a neural network while adjusting them relative to each other. The chain rule is the guts of back prop, which is the guts of gradient descent.

Again: our goal is to calculate these 5 ratios and multiply them together in order to find the ultimate ratio of how much a change in our butterfly, syn0,1 creates the change we want in our hurricane, the l2_error. How do we calculate those ratios? Next, let's take one example of one weight and walk through all the steps of the math of the chain rule.

OK, I think that's enough material to keep you from bingeing on TV or ice cream for today, so study hard (and re-read this stuff about 800 times), and remember to find the joy even in the tiny moments, like this stunning travel photo of a street vendor of tea in Cairo (I LOVE those plastic flowers on top of his teapot!).

No alt text provided for this image


To view or add a comment, sign in

More articles by David Code

  • Authentic Japan IS Fukuoka City!

    Of the 103 countries I have visited, Japan is still my favorite. Achingly beautiful: That’s what authentic Japan is.

  • See France without the Hassle or the Crowds

    My fabulous friends, my wife doesn’t like to cook on vacation. I don’t like to drive in Europe.

  • You HAVE to Try Fly-Fishing!

    My fabulous friends, we had just parked at the trailhead to our fly-fishing river near Provo, Utah when a lady drove up…

  • Doing Mexico City like a local.

    Mexico City is blessed with an overabundance of three things that make it superlative: 1) Friendly people, 2)…

  • Sicily: Eat, Drink, History, Joy.

    Friends, nowadays Sicily seems like a nice vacation spot, but 2,000 years ago it was pretty much the center of the…

  • Bologna and Trieste, Summer 2021 (with a Ljubljana bonus). Stunning.

    My fabulous friends, Bologna is the new It city in Italy, which has a well-justified reputation. UNESCO just recognized…

    1 Comment
  • Florence, Italy. Summer 2021. Joy.

    My fabulous friends, I rarely revisit the same hotel. But last summer, when I first saw the rooftop pool and bar of the…

  • Lake Como, Summer 2021. Stunning Joy.

    My fabulous friends, this story starts out innocently enough. I was eating, as I often do.

  • Why the people of Southern Italy are fabulous.

    My fabulous friends, when was the last time you lived in a fairytale castle on the Italian Riviera? Oh, so it's been a…

    4 Comments
  • How to Make Friends in Italy: Molise

    It was a national holiday that day, and most Italians were joyfully communing with their families at home, while Poor…

    6 Comments

Others also viewed

Explore content categories