A survey of cross-validation procedures for model selection

  title={A survey of cross-validation procedures for model selection},
  author={Sylvain Arlot and Alain Celisse},
  journal={Statistics Surveys},
Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its apparent universality. Many results exist on the model selection performances of cross-validation procedures. This survey intends to relate these results to the most recent advances of model selection theory, with a particular emphasis on distinguishing empirical statements from rigorous theoretical results. As a conclusion, guidelines are provided… 
Cross-Validation, Risk Estimation, and Model Selection
A simple non-parametric setting is discussed, and it is found that cross-validation is asymptotically uninformative about the expected test error of any given predictive rule, but allows for asymPTotically consistent model selection.
A Review of Cross Validation and Adaptive Model Selection
A review of model selection procedures, in particular various cross validation procedures and adaptive model selection are performed, and the connections between different procedures and information criteria are explored.
On the usefulness of cross-validation for directional forecast evaluation
Model selection criteria based on cross-validatory concordance statistics
This work presents the development and investigation of three model selection criteria based on cross-validatory analogues of the traditional and adjusted c-statistics designed to estimate three corresponding measures of predictive error, and shows that these estimators serve as suitable models selection criteria.
Model selection for estimation of causal parameters
A popular technique for selecting and tuning machine learning estimators is cross-validation. Cross-validation evaluates overall model fit, usually in terms of predictive accuracy. This may lead to
Best subset selection via cross-validation criterion
The purpose of this paper is to establish a mixed-integer optimization approach to selecting the best subset of explanatory variables via the cross-validation criterion, which can be formulated as a bilevel MIO problem and reduced to a single-level mixed- integer quadratic optimization problem.
Model selection for estimation of causal parameters
A popular technique for selecting and tuning machine learning estimators is cross-validation. Cross-validation evaluates overall model fit, usually in terms of predictive accuracy. In causal
Cross-Validation, Risk Estimation, and Model Selection: Comment on a Paper by Rosset and Tibshirani
How best to estimate the accuracy of a predictive rule has been a longstanding question in statistics. Approaches to this task range from simple methods like Mallow’s Cp to algorithmic techniques
The connection between cross-validation and Akaike information criterion in a semiparametric family
Both Akaike information criterion and cross-validation are important tools in model selection. Stone [(1977), ‘An Asymptotic Equivalence of Choice of Model by Cross-Validation and Akaikes Criterion’,


Linear Model Selection by Cross-validation
This work considers the problem of model selection in the classical regression model based on cross-validation with an additional penalty term for penalizing overfitting with a new criterion that chooses the smallest true model with probability one.
Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation
Abstract We construct a prediction rule on the basis of some data, and then wish to estimate the error rate of this rule in classifying future observations. Cross-validation provides a nearly
Unified Cross-Validation Methodology For Selection Among Estimators and a General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities and Examples
Under general conditions, the optimality results now show that the corresponing cross-validation selector performs asymptotically exactly as well as the selector which for each given data set makes the best choice (knowing the true full data distribution).
Model Selection Via Multifold Cross Validation
Two notions of multi-fold cross validation (MCV and MCV*) criteria are considered and it turns out that MCV indeed reduces the chance of overfitting.
An alternative method of cross-validation for the smoothing of density estimates
An alternative method of cross-validation, based on integrated squared error, recently also proposed by Rudemo (1982), is derived, and Hall (1983) has established the consistency and asymptotic optimality of the new method.
Robust Linear Model Selection by Cross-Validation
A robust algorithm for model selection in regression models using Shao's cross-validation methods for choice of variables as a starting point is provided, demonstrating a substantial improvement in choosing the correct model in the presence of outliers with little loss of efficiency at the normal model.
It is shown that under some conditions, with an appropriate choice of data splitting ratio, cross validation is consistent in the sense of selecting the better procedure with probability approaching 1.
On the use of cross-validation to assess performance in multivariate prediction
We describe a Monte Carlo investigation of a number of variants of cross-validation for the assessment of performance of predictive models, including different values of k in leave-k-out
Cross-Validation of Regression Models
Attention is given to models obtained via subset selection procedures, which are extremely difficult to evaluate by standard techniques, and their use illustrated in examples.
A local cross-validation algorithm