# Model Selection and the Principle of Minimum Description Length

@article{Hansen2001ModelSA, title={Model Selection and the Principle of Minimum Description Length}, author={Mark H. Hansen and Bin Yu}, journal={Journal of the American Statistical Association}, year={2001}, volume={96}, pages={746 - 774} }

This article reviews the principle of minimum description length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This approach began with Kolmogorov's theory of algorithmic complexity, matured in the literature on information theory, and has recently received renewed attention within the statistics community. Here we…

## 740 Citations

Model Selection via Minimum Description Length

- Computer Science
- 2012

This thesis proposes a class of MDL procedures which incorporate the dependence structure within individual or cluster with data-adaptive penalties and enjoy the advantages of Bayesian information criteria.

Thermodynamics of the Minimum Description Length on Community Detection

- Computer ScienceArXiv
- 2018

The Boltzmannian MDL (BMDL) is introduced, a formalization of the principle of MDL with a parametric complexity conveniently formulated as the free-energy of an artificial thermodynamic system to show the crucial importance that phase transitions and other thermodynamic concepts have on the problem of statistical modeling from an information theoretic point of view.

Minimum description complexity

- Computer Science
- 2002

The proposed method answers the challenging question of quality evaluation in identification of stable LTI systems under a fair prior assumption on the unmodeled dynamics and provides a new solution to a class of denoising problems.

Model Selection Using Information Theory and the MDL Principle

- Computer Science
- 2004

The minimum description length (MDL) principle picks the model with smallest description length, balancing fit versus complexity, as a comparison of tree-based models to regressions.

Minimum Description Length Model Selection Criteria for Generalized Linear Models

- Mathematics, Computer Science
- 2003

This paper derives several model selection criteria for generalized linear models (GLMs) following the principle of Minimum Description Length (MDL), and shows that mixture MDL can "bridge" the selection "extremes" AIC and BIC in the sense that it can mimic the performance of either criterion.

Advances in Probabilistic Graphical Models

- Computer Science
- 2017

This thesis considers Bayesian network models and implements a score-based learning algorithm using the CMDL as a scoring function, the greedy hill climber as a search procedure and with the set of covering networks as the search space.

MINIMUM DESCRIPTION LENGTH PRINCIPLE FOR LINEAR MIXED EFFECTS MODELS

- Computer Science
- 2014

This paper considers data with repeated measurements and studies the selection of fixed effect covariates for linear mixed effect models and proposes a class of MDL procedures that incorporate the dependence structure within individual or clus- ter and use data-adaptive penalties that suit both finite and infinite dimensional data generating mechanisms.

Greedy and Relaxed Approximations to Model Selection : A simulation study

- Computer Science
- 2008

This article performs extensive simulations comparing two algorithms for generating candidate models that mimic the best subsets of predictors for given sizes and finds that one method often can not serve for both selection and prediction purposes.

Minimum Description Length Model Selection in Gaussian Regression under Data Constraints

- Computer Science
- 2009

The effect of the data constraints on the selection criterion for the Gaussian linear regression criterion is demonstrated, in fact, various forms of the criterion are obtained by reformulating the shape of theData constraints.

## References

SHOWING 1-10 OF 196 REFERENCES

The Minimum Description Length Principle in Coding and Modeling

- Computer ScienceIEEE Trans. Inf. Theory
- 1998

The normalized maximized likelihood, mixture, and predictive codings are each shown to achieve the stochastic complexity to within asymptotically vanishing terms.

A new look at the statistical model identification

- Mathematics
- 1974

The history of the development of statistical hypothesis testing in time series analysis is reviewed briefly and it is pointed out that the hypothesis testing procedure is not adequately defined as…

Bayes Factors and Choice Criteria for Linear Models

- Mathematics
- 1980

SUMMARY Global and local Bayes factors are defined and their respective roles examined as choice criteria among alternative linear models. The global Bayes factor is seen to function, in appropriate…

Data compression and histograms

- Computer Science
- 1992

The relationship between code length and the selection of the number of bins for a histogram density is considered for a sequence of iid observations on [0,1] and a uniform almost sure asymptotic expansion for the code length is given and it is used to prove the asymptic optimality of the selection rule.

Spline Adaptation in Extended Linear Models

- Computer Science
- 1998

Various alternatives to greedy, deterministic schemes are considered, a Bayesian framework for studying adaptation in the context of an extended linear model (ELM) is presented, and major test cases are Logspline density estimation and Triogram regression models.

Bayesian Model Averaging for Linear Regression Models

- Mathematics
- 1997

Abstract We consider the problem of accounting for model uncertainty in linear regression models. Conditioning on a single selected model ignores model uncertainty, and thus leads to the…

Model selection and prediction: Normal regression

- Computer Science
- 1993

A new lower bound is provided for prediction without refitting, while a lower bound for prediction with refitting was given by Rissanen, and a set of sufficient conditions for a model selection criterion to achieve these bounds are specified.

Density estimation by stochastic complexity

- MathematicsIEEE Trans. Inf. Theory
- 1992

Two theoresms are proved, which together extend the universal coding theorems to a large class of data generating densities and give an asymptotic upper bound for the code redundancy in the order of magnitude, achieved with a special predictive type of histogram estimator, which sharpens a related bound.

Generalised linear model selection by the predictive least quasi-deviance criterion

- Mathematics
- 1996

We consider the problem of selecting a model with the best predictive ability in a class of generalised linear models. A predictive least quasi-deviance criterion is proposed to measure the…

Information-theoretic asymptotics of Bayes methods

- Computer ScienceIEEE Trans. Inf. Theory
- 1990

The authors examine the relative entropy distance D/sub n/ between the true density and the Bayesian density and show that the asymptotic distance is (d/2)(log n)+c, where d is the dimension of the parameter vector.