Gradient Decent Algorithm - Machine Learning

Gradient Decent algorithm is based on calculus theory and is one of the most commonly used optimization algorithm to train the machine and deep learning model by minimizing error between actual output and predicted output by model.

In mathematical term, Optimization algorithm aim is to minimizing/maximizing an objective function f(x) parameterized by x(using derivative). Similarly, in machine learning , optimization is the task of minimizing the cost function parameterized by the parameter. The main objective of gradient descent algorithm is to minimize the error function using iteration of parameter updates.

In this article, we will discuss the what is Gradient Descent, cost function, how algorithm work. So let's get started.

What is Gradient Decent(Steepest Descent)

Gradient Descent is an iterative approach algorithm and most commonly used optimization algorithms of machine learning to train the machine learning and deep learning models. It helps in finding the local minimum of a function and minimizing the error.

To define the local minimum or local maximum of a function using gradient descent one good way is if we move towards a negative gradient or away from the gradient of the function at the current point, it will give local minimum of that function. Simlarly , if we move towards a positive gradient or towards the gradient of the function at the current point, it will give local maximum of that function.

The main objective of using a gradient descent algorithm is to minimize the cost function using iteration. To achieve this goal, it performs two steps iteratively:

Calculates the first-order derivative of the function to compute the gradient or slope of that function.
Move away from the direction of the gradient, which means slope increased from the current point by alpha times, where Alpha is defined as Learning Rate. It is a tuning parameter in the optimization process which helps to decide the length of the steps.

What is Cost-Function?

The cost function is the measure of difference or error between actual values and predicted value by the model at the current position and present in the form of a single real number. It helps to increase and improve machine learning efficiency by providing feedback to this model so that it can minimize error and find the local minimum. Further, it continuously iterates along the direction of the negative gradient until the cost function approaches zero. At this steepest descent point(where derivative is 0), the model will stop learning further.

The cost function is calculated after making a hypothesis with initial parameters and modifying these parameters using gradient descent algorithms over known data to reduce the cost function.

How does Gradient Descent work?

The starting point(shown in above fig.) is used to evaluate the performance as it is considered just as an random point on the graph. At this starting point, we will derive the first derivative or slope and then use a tangent line to calculate the steepness of this slope. Further, this slope will inform the updates to the parameters (weights and bias).

The slope becomes steeper at the starting point or arbitrary point, but whenever new parameters are generated, then steepness gradually reduces, and at the lowest point, it approaches the lowest point, which is called a point of convergence.

One of the important parameter to control the speed of learning by the factor Learning Rate. It is defined as the step size taken to reach the minimum or lowest point. This is typically a small value that is evaluated and updated based on the behavior of the cost function. If the learning rate is high, it results in larger steps but also leads to risks of overshooting the minimum. At the same time, a low learning rate shows the small step sizes, which compromises overall efficiency but gives the advantage of more precision.

Gradient Decent Algorithm - Machine Learning

Mayur Ingole

More articles by Mayur Ingole

Explore content categories

More articles by Mayur Ingole

Why Activation Function Used In Neural Network?

Correlation vs Causation in Data Science

Data Science Process Framework (CRISP-DM)

Why Mathematics Is Important In Data Science?

Importance of Asking Questions in Data Science

Explore content categories