Avoid Overfitting with Regularization
Polynomial Regression & Overfitting

Avoid Overfitting with Regularization

Have you ever created a machine learning model that is perfect for the training samples but gives very bad predictions with unseen samples! Did you ever think why this happens? This article explains overfitting which is one of the reasons for poor predictions for unseen samples. Also, regularization technique based on regression is presented by simple steps to make it clear how to avoid overfitting.

The focus of machine learning (ML) is to train an algorithm with training data in order create a model that is able to make the correct predictions for unseen data (test data). To create a classifier, for example, a human expert will start by collecting the data required to train the ML algorithm. The human is responsible for finding the best types of features to represent each class which is capable of discriminating between the different classes. Such features will be used to train the ML algorithm. Suppose we are to build a ML model that classifies images as containing cats or not using the following training data.

The first question we have to answer is “what are the best features to use?”. This is a critical question in ML as the better the used features the better the predictions the trained ML model makes and vice versa. Let us try to visualize such images and extract some features that are representative of cats. Some of the representative features may be the existence of two dark eye pupils and two ears with a diagonal direction. Assuming that we extracted such features, somehow, from the above training images and a trained ML model is created. Such model can work with a wide range of cat images because the used features are existing in most of the cats. We can test the model using some unseen data as the following. Assuming that the classification accuracy of the test data is x%

One may want to increase the classification accuracy. The first thing to think of is by using more features than the two ones used previously. This is because the more discriminative features to use, the better the accuracy. By inspecting the training data again, we can find more features such as the overall image color as all training cat samples are white and the eye irises color as the training data has a yellow iris color. The feature vector will have the 4 features shown below. They will be used to retrain the ML model. 

After creating the trained model next is to test it. The expected result after using the new feature vector is that the classification accuracy will decrease to be less than x%. But why? The cause of accuracy drop is using some features that are already existing in the training data but not existing generally in all cat images. The features are not general across all cat images. All used training images have a while image color and a yellow eye irises but they are generalized to all cats. In the testing data, some cats have a black or yellow color which is not white as used in training. Some cats have not the irises color yellow.

In the testing data, some cats have a black or yellow color which is not white as used in training. Some cats have not the irises color yellow.

Our case in which the used features are powerful for the training samples but very poor for the testing samples is known as overfitting. The model is trained with some features that are exclusive to the training data but not existing in the testing data.

The goal of the previous discussion is to make the idea of overfitting simple by a high-level example. To get into the details it is preferable to work with a simpler example. That is why the rest of the discussion will be based on a regression example. 


Very much useful. The entire article has good flow and readability. Thank you.

To view or add a comment, sign in

More articles by Ahmed Gad

Others also viewed

Explore content categories