# A survey of cross-validation procedures for model selection

@article{Arlot2010ASO, title={A survey of cross-validation procedures for model selection}, author={Sylvain Arlot and Alain Celisse}, journal={Statistics Surveys}, year={2010}, volume={4}, pages={40-79} }

Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its apparent universality. Many results exist on the model selection performances of cross-validation procedures. This survey intends to relate these results to the most recent advances of model selection theory, with a particular emphasis on distinguishing empirical statements from rigorous theoretical results. As a conclusion, guidelines are provided…

## 2,963 Citations

Cross-Validation, Risk Estimation, and Model Selection

- Computer Science
- 2019

A simple non-parametric setting is discussed, and it is found that cross-validation is asymptotically uninformative about the expected test error of any given predictive rule, but allows for asymPTotically consistent model selection.

A Review of Cross Validation and Adaptive Model Selection

- Biology
- 2011

A review of model selection procedures, in particular various cross validation procedures and adaptive model selection are performed, and the connections between different procedures and information criteria are explored.

On the usefulness of cross-validation for directional forecast evaluation

- Computer ScienceComput. Stat. Data Anal.
- 2014

Model selection criteria based on cross-validatory concordance statistics

- Computer ScienceComput. Stat.
- 2018

This work presents the development and investigation of three model selection criteria based on cross-validatory analogues of the traditional and adjusted c-statistics designed to estimate three corresponding measures of predictive error, and shows that these estimators serve as suitable models selection criteria.

Model selection for estimation of causal parameters

- Mathematics
- 2020

A popular technique for selecting and tuning machine learning estimators is cross-validation. Cross-validation evaluates overall model fit, usually in terms of predictive accuracy. This may lead to…

Best subset selection via cross-validation criterion

- Computer Science
- 2020

The purpose of this paper is to establish a mixed-integer optimization approach to selecting the best subset of explanatory variables via the cross-validation criterion, which can be formulated as a bilevel MIO problem and reduced to a single-level mixed- integer quadratic optimization problem.

Model selection for estimation of causal parameters

- Mathematics
- 2020

A popular technique for selecting and tuning machine learning estimators is cross-validation. Cross-validation evaluates overall model fit, usually in terms of predictive accuracy. In causal…

Cross-Validation, Risk Estimation, and Model Selection: Comment on a Paper by Rosset and Tibshirani

- Computer Science
- 2020

How best to estimate the accuracy of a predictive rule has been a longstanding question in statistics. Approaches to this task range from simple methods like Mallow’s Cp to algorithmic techniques…

The connection between cross-validation and Akaike information criterion in a semiparametric family

- Mathematics
- 2013

Both Akaike information criterion and cross-validation are important tools in model selection. Stone [(1977), ‘An Asymptotic Equivalence of Choice of Model by Cross-Validation and Akaikes Criterion’,…

## References

SHOWING 1-10 OF 213 REFERENCES

Linear Model Selection by Cross-validation

- Mathematics, Computer Science
- 2009

This work considers the problem of model selection in the classical regression model based on cross-validation with an additional penalty term for penalizing overfitting with a new criterion that chooses the smallest true model with probability one.

Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation

- Mathematics
- 1983

Abstract We construct a prediction rule on the basis of some data, and then wish to estimate the error rate of this rule in classifying future observations. Cross-validation provides a nearly…

Unified Cross-Validation Methodology For Selection Among Estimators and a General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities and Examples

- Mathematics, Computer Science
- 2003

Under general conditions, the optimality results now show that the corresponing cross-validation selector performs asymptotically exactly as well as the selector which for each given data set makes the best choice (knowing the true full data distribution).

Model Selection Via Multifold Cross Validation

- Computer Science
- 1993

Two notions of multi-fold cross validation (MCV and MCV*) criteria are considered and it turns out that MCV indeed reduces the chance of overfitting.

An alternative method of cross-validation for the smoothing of density estimates

- Computer Science
- 1984

An alternative method of cross-validation, based on integrated squared error, recently also proposed by Rudemo (1982), is derived, and Hall (1983) has established the consistency and asymptotic optimality of the new method.

Robust Linear Model Selection by Cross-Validation

- Computer Science
- 1997

A robust algorithm for model selection in regression models using Shao's cross-validation methods for choice of variables as a starting point is provided, demonstrating a substantial improvement in choosing the correct model in the presence of outliers with little loss of efficiency at the normal model.

CONSISTENCY OF CROSS VALIDATION FOR COMPARING REGRESSION PROCEDURES

- Computer Science
- 2007

It is shown that under some conditions, with an appropriate choice of data splitting ratio, cross validation is consistent in the sense of selecting the better procedure with probability approaching 1.

On the use of cross-validation to assess performance in multivariate prediction

- MathematicsStat. Comput.
- 2000

We describe a Monte Carlo investigation of a number of variants of cross-validation for the assessment of performance of predictive models, including different values of k in leave-k-out…

Cross-Validation of Regression Models

- Environmental Science
- 1984

Attention is given to models obtained via subset selection procedures, which are extremely difficult to evaluate by standard techniques, and their use illustrated in examples.