Is Gradient Decent Optimizer An Alchemy of Loss?

Dipesh Silwal

Published Jan 5, 2022

Let’s begin with example, suppose one person is in the top of a mountain so he has to reach a bottom and he is blindfolded, now guess how he reach the bottom? obviously he will step his foot as per the terrain and move with a comfortable speed neither to high nor to slow and reach the bottom at a time. Likewise gradient decent is a iterative optimization algorithm for finding the local minimum of a function.

Here to find a local minimum of a function using gradient descent, we must take our parameter away from a gradient of a function at the current point if the derivative is positive(that means toward left ) and if the derivative is negative(that means function is decreasing), we need to move our parameter to the right to decrease the value of the function even more. Subtracting a negative value from a parameter moves it to the right.

Why Gradient Decent?

Let’s take a example of linear regression which looks like f(x) = wx+b. Here we don’t know what is the optimal value of w and b so we have to learn it from data. To do that we have to search a value of w and b such that they minimize the loss which is mean square loss(MSE) and given as:

To find that value of w and b we use gradient descent.

Steps In Calculating Gradient Descent:

1. First we have to calculate the partial derivative of a loss function with respect to w and b:

2. Update the value of w and b (Initially value of w and b is set to 0)

w= 0
b= 0

We subtract (as opposed to adding) partial derivatives from the values of parameters because derivatives are indicators of growth of a function. If a derivative is positive at some point, then the function grows at this point. Because we want to minimize the objective function, when the derivative is positive we know that we need to move our parameter in the opposite direction (to the left on the axis of coordinates). When the derivative is negative (function is decreasing), we need to move our parameter to the right to decrease the value of the function even more. Subtracting a negative value from a parameter moves it to the right.

Now implement all this step using python code:

Here first we have to see the equation of mean square loss equation of linear equation which we have used in following code:

Recommended by LinkedIn

How To Price Barrier Option Using QuantLib-Python?

Kannan Singaravelu, CQF 6 years ago

Linear Regression using Least Squares Method

Guilherme Alves Silveira 5 months ago

Missing Number

Hary K 5 years ago

And the partial derivative we have calculated and actively used in following code is:

#Implementing Gradient Decent from Scratch
def update_w_and_b(X,Y,w,b,alpha):
    dl_dw = 0   #here X means data points and Y means Target
    dl_db = 0
    n = len(X)
    
    for i in range(n):
        dl_dw+=-2*X[i]*(Y[i]-(w*X[i] + b))
        dl_db+= -2*(Y[i]-(w*X[i] + b))
        
    #update w and b
    w = w - 1/(float(n))*alpha*dl_dw
    b = b - 1/(float(n))*alpha*dl_db
    
    return w,b

What Is the Role of Alpha Here?

We know alpha is the learning rate, it shows what is the size of step our gradient decent should take to reach the global minimum(Smallest local minimum among different local minimum ).

1. If the learning rate is too small it converges to global minimum but takes too much time.

2. If the learning rate is too high it shootout and does not converges.

3. If the learning rate is as required, our algorithm converges at a required time.

THATS IT FOR TODAY

HAPPY LEARNING

Is Gradient Decent Optimizer An Alchemy of Loss?

Dipesh Silwal

Recommended by LinkedIn

More articles by this author

Others also viewed

Did you mean Kohli or Holi ?

Mastering Efficient Searches: Finding the Correct Insertion Point

Fitting a Linear regression "Model"

Vector Databases Demystified: Part 2 - Building Your Own (Very) Simple Vector Database in Python

Why & how do we split our data set before model building?

Big O Notation Explained As Simple As Possible

Understanding Time Complexity in Data Structures

Understanding Tool Calling in LangGraph

Linear Regression

Explore content categories

Recommended by LinkedIn

Tune Hyperparameters with GridSearchCV

Jan 6, 2022

Confusion Matrix, Accuracy, Precision, Recall & F1 Score: Interpretation of Performance Measures

Jan 5, 2022

PCA By Manual Method and Using Scikit Learn

Jan 1, 2022

Is Correlation and Covariance Same?

Dec 31, 2021

Balance of Bias and Variance

Dec 30, 2021

Demystifying Tensor

Dec 29, 2021

Others also viewed

Did you mean Kohli or Holi ?

Mastering Efficient Searches: Finding the Correct Insertion Point

Fitting a Linear regression "Model"

Vector Databases Demystified: Part 2 - Building Your Own (Very) Simple Vector Database in Python

Why & how do we split our data set before model building?

Big O Notation Explained As Simple As Possible

Understanding Time Complexity in Data Structures

Understanding Tool Calling in LangGraph

Linear Regression

Explore content categories