How to optimize your predictive model performance on Bias Vs Variance dilemma?

Bika Sousa

Published Oct 5, 2019

Prediction is an important part of science which helps us to correctly assume factors which impacts greatly on many aspects of our life. Whenever we are discussing about predictive model particularly, be it a regression or classification problem, it’s highly important to understand prediction errors (bias and variance). Many times, there is a trade-off between a model’s ability to minimize bias and variance. Gaining a proper understanding of these errors would help us not only to build accurate models but also to avoid the mistake of overfitting or underfitting.

What is bias?

Models with high bias pays very less attention to the data and oversimplify the model parameters. It generally leads to a high error on training and test data. This is what we generally define as Underfitting.

An example of underfitting can be seen on bellow graph where the data has a non-linear pattern and the predictive model has a linear pattern.

The best line (blue line) tells us that the model is not a good fit to the data because the data has a non-linear pattern.

The effect of underfitting the data is that the predictive model will consistently fail to predict the data over the time by remaining linear whilst the reality is different than strictly linear.

What is variance?

Variance is the opposite to the bias, model with high variance tries to reduce the errors on training data (it even can get to 0 error) by increasing the complexity on model parameters, which will correctly capture all noise on training data. As a result, these models perform very well on the training data but have high error rates on test data because it paid a lot of attention to the noise on training data rather than the overall trend. Bellow figure summarizes the variance concept on data

The above model was created using a polynomial with 8 degrees to correctly describe the training data, but it is clueless to where should the next data point fall after point 39.

(There is more to be said on how to easily find out if your model is suffering from bias or variance... stay tuned to my next article)

Now that we understood what is bias and variance, we now feel that there must be a trade-off between them, we do not want a model that is too simplistic that does not capture any trends and we also do not want a very complex model that is too sensible to the noise on the data that we are training them. We want to be exactly on the middle as on bellow graph

So how do we go about optimizing our model prediction without underfitting or overfitting?

One of the easiest way to go about this is:

Create a list of lambdas (i.e. λ∈{0,0.02,0.08...}); As the λ increases the more our model parameters are penalized, reducing them close to 0, thus having a very simplistic model which is prone to bias.
Create a set of models with different degrees or any other variants.
Iterate through the λs and for each λ go through all the models to learn some parameter Θ
Compute the cross validation error using the learned parameter Θ (computed with λ) on the cost function for cross validation (Jcv) Θ without regularization or λ = 0.
Select the best combo that produces the lowest error on the cross validation set.
Using the best combo Θ and λ, apply it on cost function for test (Jtest) Θ to see if it has a good generalization of the problem.

After all steps, the above plotted Error Vs Lambda graph summarizes everything, at lambda 3 we expect to have the lowest error on our cross validation data set and it can be seen clearly that it is neither overfitting nor underfitting the training data see bellow how the model fits the training data

The next step consists on testing how well our model parameters with lambda 3 generalizes the error.

Therefore understanding where exactly your model stands on bias vs variance dilemma is very critical for understanding the behavior of prediction models.

If you are new to data science or have any interest on its application, please feel free to reach out as I will be more than happy to go through with you on some terms which I have not gone through them on this article.

Thanks for reading and enjoy the maths magic!

BSousa

To view or add a comment, sign in

How to optimize your predictive model performance on Bias Vs Variance dilemma?

Bika Sousa

More articles by Bika Sousa

Others also viewed

Regularization: An Essential Tool for Modeling Complex Data

Statistical remedies against macro information overload

Essentials of Time Series Forecasting: Key Components, Challenges, and Algorithms

Model Drift, Automatic Retraining and How Not to Ruin your Models

From Regression to Regularization: Why Penalizing Coefficients is Essential in Modern Supply Chain Analytics Modeling

Statistical Distributions: Types and Importance.

Shapes & Patterns (2): Sequences

Dimension Reduction Technique - Principal Component Analysis(PCA) using "iris" data:

Regularization — a solution to overfitting

Dimension Reduction - Principal Component Analysis (aka PCA)

Explore content categories