Model Selection and the Principle of Minimum Description Length

@article{Hansen2001ModelSA,
  title={Model Selection and the Principle of Minimum Description Length},
  author={Mark H. Hansen and Bin Yu},
  journal={Journal of the American Statistical Association},
  year={2001},
  volume={96},
  pages={746 - 774}
}
  • M. Hansen, Bin Yu
  • Published 1 June 2001
  • Computer Science
  • Journal of the American Statistical Association
This article reviews the principle of minimum description length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This approach began with Kolmogorov's theory of algorithmic complexity, matured in the literature on information theory, and has recently received renewed attention within the statistics community. Here we… 
Model Selection via Minimum Description Length
  • Li
  • Computer Science
  • 2012
TLDR
This thesis proposes a class of MDL procedures which incorporate the dependence structure within individual or cluster with data-adaptive penalties and enjoy the advantages of Bayesian information criteria.
Thermodynamics of the Minimum Description Length on Community Detection
TLDR
The Boltzmannian MDL (BMDL) is introduced, a formalization of the principle of MDL with a parametric complexity conveniently formulated as the free-energy of an artificial thermodynamic system to show the crucial importance that phase transitions and other thermodynamic concepts have on the problem of statistical modeling from an information theoretic point of view.
Minimum description complexity
TLDR
The proposed method answers the challenging question of quality evaluation in identification of stable LTI systems under a fair prior assumption on the unmodeled dynamics and provides a new solution to a class of denoising problems.
Model selection by Normalized Maximum Likelihood
Model Selection Using Information Theory and the MDL Principle
TLDR
The minimum description length (MDL) principle picks the model with smallest description length, balancing fit versus complexity, as a comparison of tree-based models to regressions.
Minimum Description Length Model Selection Criteria for Generalized Linear Models
TLDR
This paper derives several model selection criteria for generalized linear models (GLMs) following the principle of Minimum Description Length (MDL), and shows that mixture MDL can "bridge" the selection "extremes" AIC and BIC in the sense that it can mimic the performance of either criterion.
Advances in Probabilistic Graphical Models
TLDR
This thesis considers Bayesian network models and implements a score-based learning algorithm using the CMDL as a scoring function, the greedy hill climber as a search procedure and with the set of covering networks as the search space.
MINIMUM DESCRIPTION LENGTH PRINCIPLE FOR LINEAR MIXED EFFECTS MODELS
TLDR
This paper considers data with repeated measurements and studies the selection of fixed effect covariates for linear mixed effect models and proposes a class of MDL procedures that incorporate the dependence structure within individual or clus- ter and use data-adaptive penalties that suit both finite and infinite dimensional data generating mechanisms.
Greedy and Relaxed Approximations to Model Selection : A simulation study
TLDR
This article performs extensive simulations comparing two algorithms for generating candidate models that mimic the best subsets of predictors for given sizes and finds that one method often can not serve for both selection and prediction purposes.
Minimum Description Length Model Selection in Gaussian Regression under Data Constraints
TLDR
The effect of the data constraints on the selection criterion for the Gaussian linear regression criterion is demonstrated, in fact, various forms of the criterion are obtained by reformulating the shape of theData constraints.
...
...

References

SHOWING 1-10 OF 196 REFERENCES
The Minimum Description Length Principle in Coding and Modeling
TLDR
The normalized maximized likelihood, mixture, and predictive codings are each shown to achieve the stochastic complexity to within asymptotically vanishing terms.
A new look at the statistical model identification
The history of the development of statistical hypothesis testing in time series analysis is reviewed briefly and it is pointed out that the hypothesis testing procedure is not adequately defined as
Bayes Factors and Choice Criteria for Linear Models
SUMMARY Global and local Bayes factors are defined and their respective roles examined as choice criteria among alternative linear models. The global Bayes factor is seen to function, in appropriate
Data compression and histograms
TLDR
The relationship between code length and the selection of the number of bins for a histogram density is considered for a sequence of iid observations on [0,1] and a uniform almost sure asymptotic expansion for the code length is given and it is used to prove the asymptic optimality of the selection rule.
Spline Adaptation in Extended Linear Models
TLDR
Various alternatives to greedy, deterministic schemes are considered, a Bayesian framework for studying adaptation in the context of an extended linear model (ELM) is presented, and major test cases are Logspline density estimation and Triogram regression models.
Bayesian Model Averaging for Linear Regression Models
Abstract We consider the problem of accounting for model uncertainty in linear regression models. Conditioning on a single selected model ignores model uncertainty, and thus leads to the
Model selection and prediction: Normal regression
TLDR
A new lower bound is provided for prediction without refitting, while a lower bound for prediction with refitting was given by Rissanen, and a set of sufficient conditions for a model selection criterion to achieve these bounds are specified.
Density estimation by stochastic complexity
TLDR
Two theoresms are proved, which together extend the universal coding theorems to a large class of data generating densities and give an asymptotic upper bound for the code redundancy in the order of magnitude, achieved with a special predictive type of histogram estimator, which sharpens a related bound.
Generalised linear model selection by the predictive least quasi-deviance criterion
We consider the problem of selecting a model with the best predictive ability in a class of generalised linear models. A predictive least quasi-deviance criterion is proposed to measure the
Information-theoretic asymptotics of Bayes methods
TLDR
The authors examine the relative entropy distance D/sub n/ between the true density and the Bayesian density and show that the asymptotic distance is (d/2)(log n)+c, where d is the dimension of the parameter vector.
...
...