Deep learning part 1 – Understanding the process

Sameer Kumar

Published Jan 29, 2018

This article tells the story of discovery of gravity with a twist. The twist has been introduce to the story to make it easier to understand deep learning concept. This may be weird way of explaining the concept, but please bear with me. I am sure you would not be disappointed. But if you are, please send me your brickbats so that I can improve.

Disclaimer: Though the characters are real, the facts has been twisted and most of it is imaginary. The twisting is without malice and towards a goal.

This story is about Isaac, the curious boy. One day he was sleeping under an apple tree when an apple fell on his head and wake him up. This triggered a train of thought. He was curious why the apple fell on his head and did not fly away like a bird. He was determined to find out. He was sure that the earth is pulling the apple towards itself. But if God has made everything, he will not discriminate and will make everything equal. So if the big earth is pulling the small apple, the small apple must also be pulling the big earth towards itself. And if so, then all the masses in the world and may be this universe are pulling each other. He felt smart about this discovery. But though he was sleeping under the apple tree, he was not a lazy child. Once his train of thought was triggered, he wanted to reach to the logical conclusion.

He conducted experiment to find the force between various mass and listed them in a table. He coined a unit “Newton” to describe the force between the masses. Following is a reproduction of a fraction his table. He actually conducted hundreds of thousands of experiments.

Don’t ask me how he was able to conduct his experiment or how he managed such big masses because he did not explained it to me. But what he did explained was even weird.

He said that the Mass M1, M2 and the distance between them is his input data and the Force between then is the output data. He called the input data as X and the output data as Y. So far so good. But then he said that he want to find “Y hat”. This “Y hat”, he said will be same as Y but instead of conducting experiment, he will calculate the value of “Y hat” using some mathematical calculation. He was not sure though about how will he do this calculation. But he was sure that if he can find that formula, he can find the force between any two mass in the universe.

We met again after a few week. He looked gloomy. He had is sure that the mathematical formula will look something like this:

He told me that he got some help from his friend Kepler to arrive at this conclusion but he did not find it fit to explain to me how the two boys concluded this. I was still happy that he shared these detail with me which looks interesting. But then I noticed, he has introduced a constant in his calculation. He was gracious enough to tell me that this is a new input he has introduced and he called it the Gravitational constant. He had tried to put m1, m2 and r into many mathematical function but the values they were calculating for “Y hat” was too big. So he think if he multiply the output of his mathematical function with this small number, he will get near “Y” (If you remember from a few para back, Y is the actual force he measured and “Y hat” is the force he is trying to calculate using his mathematical formula). He has estimated that the value of G would be in the scale of -11 power of 10.

We did not met for a long time after this conversation. But one day he burst into my room, looking very excited. His friend Yaan has invented a gadget called “Neuron” which is speeding up his research. He can now calculate his “Y hat” more quickly. He showed me a rough sketch of how this neuron is working. His dear friend Henry was helping him to run these iterations.

He passed the value of X (m1, m2 and r) and G. He has instructed the neuron to use the formula shown within the Neuron in this diagram and he gets the values of “Y hat”. He then compares the “Y hat” value to Y value. Depending on the difference, he adjust the value of G and run next iteration.Isaak told me that Henry is pretty good at managing such computation and very soon he expects to find the value of G which will result in zero difference between Y and “Y hat” (A little reminder again: Y is the actual force he measured and “Y hat” is the force he is trying to calculate using his mathematical formula)

And indeed; a few days later Henry proudly declared G to be 6.754 * 10 to the power of (-11).

The success of using the neuron to solve this gave Yaan a lot of satisfaction. It then set him into a new direction. He started trying to develop a machine which can look at a picture and tell you if it is a picture of a cat or not. He thought probably he could use the neuron. He told me a bit about the approach he is taking. He had hundreds of thousands of pictures of “cat” and “not a cat” and he had converted then in pixel values in the matrix form. He represented each picture by a three 64 * 64 matrix, each representing the pixel value of the colour red green and blue; and he has marked each picture as “cat” or “not a cat” by looking at the picture. He wanted to create a gadget. Using these pictures, Yaan wanted to teach the gadget so that it can look at any new picture and identify if it is a cat or not. And he thought putting a neuron in this gadget would do the trick.

He explained that Isaac had three input variables m1, m2 and r; whereas he has 64 * 64 * 3 (= 12288) input values. These 12288 values represent a picture. Any change in any of the 12288 values will change the picture. Each of these 12288 values can take a value from 0 to 224. So total number of possible picture is 22412288.that is 224 multiplied 12288 times. And he want the gadget to identify each of these pictures as “cat” or “not a cat”. Going through all of these possible picture and storing those in a memory is out of question. Apart from the fact that it will take a lot of memory, the solution will not be scalable if he uses pictures if higher or lower resolution. So he was looking for an intelligent device which can learn by looking at the pictures has labeled and identify any new picture as “cat” or “not a cat”. He was sure that the neuron could help him. The Set of matrix representing the labeled pictures (as “cat” and “not a cat”) could be designated as input X. The way the constant has to be identified will be different. Instead of one constant which Isaac had to find (The gravitational constant G), he will have to find a number of constants. And yes he probably could not use the mathematical formula used by Isaac to find the gravitational force. He will have to identify a different one. Or probably he will have to identify more than one mathematical formula.

I did not see Yaan for a few weeks. But when I did, he was deep in his thought, walking in the central park with a rolled up paper sheet under his arm. I had to call him twice before he responded. He has been able to formulate how he is going to use the neurons, but he realised that he will need a lot of computing power to crack it. Here is a representation of what proposed to do.

Holy cow! This surely looks complicaed. He sat down on a bench to explain this to me. In Isaac case, only one neutron was used. Why Yaan needs so many. He replied that this is just the tip of the iceberg. He suspect he would need many more. His explanation was that in Isaac’s case, there were only three inputs. So a simple formula and a single constant did the trick and one neuron did the trick. Here we have 12288 inputs for a 64 * 64 pixel picture. Just imagine the number of pixel needed to represent pictures of finer resolution. But I think we are jumping the gun. Let me first explain the above diagram.

The proposal was that he will take three neurons at the first level and will past the 12288 pixel value of a picture to it. In the pixel, he would use a mathematical formula which looks like this for the first neuron:

z = w1[0,0]*x1 + w1[0,1] * x2 + w1[0,2] * x3 + ……. w1[0,12287] * x12288 + b.

And then the neuron will apply a function called “Relu” to the z value

a = Relu(z)

The x values are the pixel values of a single picture. The w and b values represent the constant which needs to be found. So while Isaac had to find only one constant (The G value), Yaan needs to find value of so many w and b.

So the output of the first neuron would be

a[0,1] = Relu(x1 * w1[0,0] + x2 * w1[0,1] + x3 * w1[0,2] + ………. + x12288 * w1[0,12287]) + b0,1

Output of second neuron:

a[0,2] = Relu(x1 * w1[1,0] + x2 * w1[1,1] + x3 * w1[1,2] + ………. + x12288 * w1[1,12287]) + b0,2

And the output of third neuron:

a[0,3] = Relu(x1 * w1[2,0] + x2 * w1[2,1] + x3 * w1[2,2] + ………. + x12288 * w1[2,12287]) + b0,3

Then these three neurons will pass their output to the two neurons in the second level. So the output of the two second level neuron would be:

Output of 4th neuron (First neuron of second level):

a[1,1] = Relu(a[0,1] * w2[0,0] + a[0,2] * w2[0,1] + a[0,3] * w2[0,2] + b1,1)

Output of 5th neuron (Second neuron of second level):

a[1,2] = Relu(a1,1 * w2[0,0] + a1,2 * w2[0,1] a1,3 * w2[0,2] + b1,2)

And on the last layer, he put only one neuron and used a different mathematical function called sigmoid. So the output of the final level (Neuron no. 6) would be

y hat = Sigmoid(a[1,1] * w3[0,0] + a[1,2] * w3[0,1]) + b2,1

The output of the Sigmoid function will always be a value between 0 and 1 and it will indicate the probability of the picture being a “cat” or “not a cat”.

I was sure he has gone nuts. Why will this work!!??. Agreed that he will get an output which is between 0 and 1 but “so what??”

He said it will surely not work in the first iteration. But iteration has great power. What he would do is that he will calculate the “y hat” value of each picture and compare it with the “y” value (Which recall from earlier discussion is actual fact if the picture is “cat” or “not a cat”) he will know how far off the mark he is. He will then modify the constants (all the w and b values) and try again. He will keep repeating these iteration till he gets “y” and “y hat” very identical.

I was not sure if this is going to work. He was not too sure either but wanted to give it a try. But he did not had the computing power to calculate all these. I introduced him to my friend Moore who promised he will be able to give him a machine which will double its computing power every two years. Yaan was very happy and set out to implement his idea using this machine. But more on this story in the next installment. Stay tuned.

Reference:

The cat picture: https://www.pexels.com/photo/grey-and-white-short-fur-cat-104827/

Deep learning tutorial: https://www.coursera.org/learn/neural-networks-deep-learning

Deep learning part 1 – Understanding the process

Sameer Kumar

More articles by Sameer Kumar

Others also viewed

Accuracy vs. Precision vs. Recall in Deep Learning

Bias-Variance as deep learning optimization tool

Deep learning: End-to-end deep learning

Understanding Association Learning for Everyone: The Backbone of Deep Learning and LLM

Unlocking Understanding: The Feynman Technique for Deep Learning

Reducing the computational cost Mathematically- Candidate Sampling!

DEEP LEARNING: EINSTEIN OR SAVANT?

Deep Learning in practice. Part I - when your results look too good

Deep learning in IT Services

Explore content categories