Overfitting

Stephanie Bartruff

Published Apr 14, 2016

Overfitting

In statistics, the name given to the act of mistaking noise for a signal is overfitting.

Suppose that you’re some sort of petty criminal and I’m your boss I deputize you to figure out a good method for picking combination locks of the sort you might find in a middle school--maybe we want to steal everybody’s lunch money. I want an approach that will give us a high probability of picking a lock anywhere and anytime. I give you three locks to practice on—a red one, a black one, and a blue one.

After experimenting with the locks for a few days, you come back and tell me that you’ve discovered a fool proof solution. If the lock is red, you say, the combination is 27-12-31. If it’s black, use the numbers 44-14-19. And if it’s blue, its’s 10-3-32.

I’d tell you that you’ve completely failed your mission. You’ve clearly figured out how to open these three particular locks. But you haven’t done anything to advance our theory of lock-picking—to give us some hope of picking them when we don’t know the combination in advance. I’d have been interested in knowing say, whether there was a good type of paper clip for picking these locks, or some sort of mechanical flaw we can exploit. Or failing that, if there’s some trick to detect the combination: maybe certain types of numbers are used more often than others? You’ve given me an overly specific solution to a general problem. This is overfitting, and it leads to worse predictions.

(Or it would be like picking the Cavs or the Spurs to win the NBA championship in 2015 because a Lebron led team or Spurs had won the last three championships.)

The name overfitting comes from the way that statistical models are “fit” to match past observations. The fit can be too loose—this is called underfitting—in which case you will not be capturing as much of the signal as you could. Or it can be too tight –an overfit model—which means that you’re fitting the noise in the data rather than discovering its underlying structure. The latter error is much more common in practice.

Overfitting represents a double whammy: it makes our model look better on paper but perform worse in the real world. Because of the latter trait, an overfit model eventually will get its comeuppance if and when it is used to make real predictions. Because of the former, it may look superficially more impressive until then, claiming to make very accurate and newsworthy predictions and to represent an advance journal or to sell to a client, crowding out more honest models from the marketplace. But if the model is fitting noise, it has the potential to hurt the science.

From the signal and the noise: why so many predictions fail but some don't by Nate Silver

Zev Berkovich 10y

To properly forecast a set of data points, it's necessary to first understand all the factors which might influence the results. Fitting a model to the set of points is futile.

To view or add a comment, sign in

Overfitting

Stephanie Bartruff

More articles by Stephanie Bartruff

Others also viewed

Analytic War Story: When Should You Stop a Purse Snatching?

Uses of Partial Correlation

What This Behavior Analysis All About?

Must know concepts – The Exponent (e)

Out of Sample ->Out of Mind

Statistical Pitfalls: Proper Sampling

There’s a Pattern Behind Every Product’s Journey—And It Just Unlocked Itself

It's About Time

How Confounding Variables Produce Misleading Results: An Example With Super Mario Maker

What I Learned from Comparing Two Train/Test Splits for Predicting Hypo/Hyper Events (CGM Time-Series)

Explore content categories

More articles by Stephanie Bartruff

Examining the Recent Research from Dr. Wade Pfau on Retirement Income Planning & CANNEX Research Reviewing Guaranteed Income Across Annuity Products

Sound the Bond Alarm

Roger Ibbotson's Latest White Paper Fixed Indexed Annuities: Consider the Alternative

How you Present a Situation Matters

The NYSE Zebra Edge® Index won Structured Retail Product’s Americas Award 2017 for Proprietary Index of the Year, Americas

Annexus Ventures Launches the First End to End DOL Solution to Include Fixed Indexed Annuities