AIC/BIC in Model Selection

RICHARD RANDA

Published Feb 18, 2019

It is well known that Akaike information criterion (AIC) and Schwarz’s Bayesian Information Criterion (BIC) are both penalized-likelihood information criteria. AIC is an estimate of a constant plus the relative distance between the unknown true likelihood function of the data and the fitted likelihood function of the model, whereas BIC is an estimate of a function of the posterior probability of a model being true, under a certain Bayesian setup.

They consist of a goodness-of-fit term plus a penalty to control over-fitting, and provide a standardized way to balance between having enough parameters to adequately model the relationships among variables in the population (sensitivity) and not over-fitting a model or suggesting nonexistent relationships (specificity);hence the smaller the AIC/BIC, the better the model.

So, which one of these information criterion should be used given a number of models? Well, ideally they should be used concurrently. Despite various subtle theoretical differences between the two, their only difference in practice is the size of the penalty; BIC tends to be more parsimonious when it comes to penalizing model complexity even though it is sometimes preferred over AIC for some of the reasons highlighted below;

BIC is "consistent." - a consistent selector is one which will select the true model with probability approaching 100% as n tends to infinity. AIC is not consistent because it has a non-vanishing chance of choosing an unnecessarily complex model as n becomes large.
BIC considers Type I and Type II errors to be about equally undesirable, while AIC considers Type II errors to be more undesirable than Type I errors unless n is very small.

In classical hypothesis testing, over-fitting (Type I errors) are considered worse than Type II errors, whereas for prediction, a Type II error can be harmful as well.

Conclusion

AIC criterion often risk choosing too large a model, whereas BIC often risk choosing too small a model. In modelling, there's always a risk of either under-fitting, for small n or over-fitting for large n. For cases where n is small, criteria with lower under-fitting rates, such as AIC often seem better, whereas in cases where n is large (where large is relative), more parsimonious criteria, such as BIC, often seem better.

It is worth noting that AIC or BIC cannot tell you how well a particular model explains your data, it can only tell you if it strikes the balance between model complexity and specificity than other models.

References

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & F. Csaki (Eds.), Second international symposium on information theory (p. 267-281). Budapest, Hungary: Akademai Kiado.

Atkinson, A. C. (1980). A note on the generalized information criterion for choice of a model. Biometrika, 67, 413-418.

Bozdogan, H. (1987). Model selection and Akaike’s Information Criterion (AIC): The general theory and its analytical extensions. Psychometrika, 52, 345-370.

John J. Dziak. (2012). Sensitivity and specificity of information criteria.The Methodology Center, Penn State.

Jay Roh 2y

Thanks!

To view or add a comment, sign in

AIC/BIC in Model Selection

RICHARD RANDA

More articles by RICHARD RANDA

Others also viewed

Let's evaluate classification model with ROC and PR curves.

Exploring Dimensionality Reduction methods

Ockham’s Razor: A Good Shave (?) to the Regression Analysis

The Probability of Probability: Why a 70% Confidence Score is Often a Lie.

Dimensionality Reduction

Model Evaluation Technique

Multi-Variate Time Series Modelling in R

Omit the Confusion about Confusion Matrix

No thanks, I don't F₁

Techniques to reduce overfitting

Explore content categories