Regularization Technique

Mahendra Gurjar

Published Dec 17, 2020

Regularization

Introduction

Regularization is a procedure used to decrease the blunders by fitting the capacity suitably on the given training set and maintain overfitting

Regularization is one of the main concept of machine learning. It is a method to prevent the model from overfitting by adding additional data to it.

Some of the time the machine learning model performs well with the training dataset but doesn't perform well with the test dataset. It implies that the model can't able to predict proper output when deals with new or unknown dataset by presenting noise in the output information, and hence the model is called as over fitted model. This issue can be arranged with the help of a regularization method.

This method can be utilized so that it will allow to keep maintain all features or variables in the model by shrinking the magnitude of all variables. Hence, it keeps maintain the accuracy as well as the generalization of the model.

Overfitting - Overfitting occurs when the trained model performs well on the training data and performs poorly on the testing datasets.

Fig.1.

Working of regularization

Regularization technique mainly works by adding a penalty term to the complex model. Let's consider the equation of simple linear regression:

From the above equation it is shown that y is a predictor variable or we can say that y is a dependent variable and rest of all are independent variable.

X1, X2 …Xn are the features for Y.

β0, β1,…..βn are the weights or magnitude attached to the features.

Regularization Techniques

· Ridge Regression(L2)

· Lasso Regression(L1)

Ridge regression

· Ridge regression is the types of linear regression in which some amount of bias is added so that we can get better long-term result.

· Ridge regression works by applying a penalizing term(to reduce the weights and biases) to overcome overfitting.

· Least sum of squares is applied to obtain the best fit line.

· Since the line passes through the 3 training dataset points, the sum of squared residuals =0.

· For testing dataset the line has high variance.

· Variance means there is a difference in fit between the training and testing dataset.

· This regression model is overfitting the training dataset.

Fig.2.

· Ridge regression works by attempting at increasing the bias to improve variance.

· This works by changing the slop of the line.

· The model performance might be little poor on the training set but it will perform consistently well on both the training and testing dataset.

· Slop has been reduced with ridge regression penalty and therefore the model becomes less sensitive to changes in the independent variable/

· List squares regression : Min(sun of the squared residuals)

· Ridge regression : min(sum of squared residuals + alpha * slope^2)

· As alpha increases, the slop of the regression line is reduced and becomes more horizontal.

· As alpha increases, the model becomes less sensitive to the variations of the independent variable

Fig.4.

Lasso regression

· Lasso Regression is similar to ridge regression.

· It works by introducing a bias term but instead of squaring the slope, the absolute value of the slope is added as a penalty term.

· Least squares regression: Min(sun of the squared residuals).

· Lasso regression: min(sum of squared residuals + alpha * |slope|.

· Slope has been reduced with ridge regression penalty and therefore the model becomes less sensitive variable (# years of experience)

Fig.5.

· The effect of alpha on lasso regression is similar to its effect on ridge regression.

· As alpha increases, the slope of the regression line is reduced and becomes more horizontal.

· Lasso regression helps reduce overfitting and it is particularly useful for feature selection.

· Lasso regression can be useful if we have several independent variable that are useless.

· Ridge regression can reduced the slope close to zero (but not exactly zero) but lasso regression can reduced the slope to be exactly equal to zero.

Regularization Technique

Mahendra Gurjar

More articles by Mahendra Gurjar

Others also viewed

Error in our Models : Why Testing Performance of an Overfit Model is lower than Training Performance and How it is related to Bias-Variance Tradeoff !

Normalization, a requirement or not?

Validation Strategies in Machine Learning: Critical Analysis of Cross-Validation Techniques and Data Splitting Methods

Confusion Matrix - IDS

Overfitting vs Underfitting in ML What’s the Difference?

Binary classification

HYPER PARAMETER TUNING:-

Brief Introduction to Four categories of Feature Selection Techniques

A Simple Introduction to Cross-Validation

Regularization Methods in Machine Learning

Understanding Overfitting In Predictive Analytics

Linear Regression Models

How to Optimize Machine Learning Performance

How To Fine-Tune AI Models On Small Datasets

Supervised Learning Techniques

Explore content categories