Is Gradient Decent Optimizer An Alchemy of Loss?
Source: Clairvoyant

Is Gradient Decent Optimizer An Alchemy of Loss?

No alt text provided for this image

Let’s begin with example, suppose one person is in the top of a mountain so he has to reach a bottom and he is blindfolded, now guess how he reach the bottom? obviously he will step his foot as per the terrain and move with a comfortable speed neither to high nor to slow and reach the bottom at a time. Likewise gradient decent is a iterative optimization algorithm for finding the local minimum of a function.

Here to find a local minimum of a function using gradient descent, we must take our parameter away from a gradient of a function at the current point if the derivative is positive(that means toward left ) and if the derivative is negative(that means function is decreasing), we need to move our parameter to the right to decrease the value of the function even more. Subtracting a negative value from a parameter moves it to the right.

 Why Gradient Decent?

Let’s take a example of linear regression which looks like f(x) = wx+b. Here we don’t know what is the optimal value of w and b so we have to learn it from data. To do that we have to search a value of w and b such that they minimize the loss which is mean square loss(MSE) and given as:

No alt text provided for this image


To find that value of w and b we use gradient descent.

Steps In Calculating Gradient Descent:

1.    First we have to calculate the partial derivative of a loss function with respect to w and b:

No alt text provided for this image



2.    Update the value of w and b (Initially value of w and b is set to 0)

  • w= 0
  • b= 0

No alt text provided for this image


No alt text provided for this image


We subtract (as opposed to adding) partial derivatives from the values of parameters because derivatives are indicators of growth of a function. If a derivative is positive at some point, then the function grows at this point. Because we want to minimize the objective function, when the derivative is positive we know that we need to move our parameter in the opposite direction (to the left on the axis of coordinates). When the derivative is negative (function is decreasing), we need to move our parameter to the right to decrease the value of the function even more. Subtracting a negative value from a parameter moves it to the right.

Now implement all this step using python code:

Here first we have to see the equation of mean square loss equation of linear equation which we have used in following code:

No alt text provided for this image


And the partial derivative we have calculated and actively used in following code is:

No alt text provided for this image



#Implementing Gradient Decent from Scratch
def update_w_and_b(X,Y,w,b,alpha):
    dl_dw = 0   #here X means data points and Y means Target
    dl_db = 0
    n = len(X)
    
    for i in range(n):
        dl_dw+=-2*X[i]*(Y[i]-(w*X[i] + b))
        dl_db+= -2*(Y[i]-(w*X[i] + b))
        
    #update w and b
    w = w - 1/(float(n))*alpha*dl_dw
    b = b - 1/(float(n))*alpha*dl_db
    
    return w,b        


What Is the Role of Alpha Here?

We know alpha is the learning rate, it shows what is the size of step our gradient decent should take to reach the global minimum(Smallest local minimum among different local minimum ).

No alt text provided for this image


1.    If the learning rate is too small it converges to global minimum but takes too much time.

2.    If the learning rate is too high it shootout and does not converges.

3.    If the learning rate is as required, our algorithm converges at a required time.


THATS IT FOR TODAY

HAPPY LEARNING

 

 

 

In code snippet X means feature vector not a data points.

Like
Reply

To view or add a comment, sign in

Others also viewed

Explore content categories