# Optimal cross-validation in density estimation with the $L^{2}$-loss

@article{Celisse2014OptimalCI, title={Optimal cross-validation in density estimation with the \$L^\{2\}\$-loss}, author={Alain Celisse}, journal={Annals of Statistics}, year={2014}, volume={42}, pages={1879-1910} }

We analyze the performance of cross-validation (CV) in the density estimation framework with two purposes: (i) risk estimation and (ii) model selection. The main focus is given to the so-called leave-$p$-out CV procedure (Lpo), where $p$ denotes the cardinality of the test set. Closed-form expressions are settled for the Lpo estimator of the risk of projection estimators. These expressions provide a great improvement upon $V$-fold cross-validation in terms of variability and computational…

## 37 Citations

Choice of V for V-Fold Cross-Validation in Least-Squares Density Estimation

- MathematicsJ. Mach. Learn. Res.
- 2016

A non-asymptotic oracle inequality is proved for V-fold cross-validation and its bias-corrected version (V-fold penalization), implying that V- fold penalization is asymptotically optimal in the nonparametric case.

Theoretical Analysis of Cross-Validation for Estimating the Risk of the $k$-Nearest Neighbor Classifier

- Computer ScienceJ. Mach. Learn. Res.
- 2018

A general strategy to derive moment and exponential concentration inequalities for the L$p$O estimator applied to the $k-nearest neighbors ($k$NN) rule in the context of binary classification is described.

Local asymptotics of cross-validation in least-squares density estimation

- Mathematics
- 2021

In model selection, several types of cross-validation are commonly used and many variants have been introduced. While consistency of some of these methods has been proven, their rate of convergence…

Estimating the Kullback–Liebler risk based on multifold cross‐validation

- Mathematics
- 2015

This paper concerns a class of model selection criteria based on cross‐validation techniques and estimative predictive densities. Both the simple or leave‐one‐out and the multifold or leave‐m‐out…

Bias-aware model selection for machine learning of doubly robust functionals

- Mathematics, Economics
- 2019

An oracle property is established for a multi-fold cross-validation version of the new model selection criteria which states that the empirical criteria perform nearly as well as an oracle with a priori knowledge of the pseudo-risk for each candidate model.

Targeted Cross-Validation

- Computer ScienceArXiv
- 2021

This work proposes a targeted cross-validation (TCV) to select models or procedures based on a general weighted L2 loss and shows that the TCV is consistent in selecting the best performing candidate under the weighted L1 loss.

New upper bounds on cross-validation for the k-Nearest Neighbor classification rule

- Computer Science
- 2015

A new strategy to derive bounds on moments of the leave-pout estimator used to assess the performance of the kNN classifier is provided and these moment upper bounds are used to settle a new exponential concentration inequality for binary classification.

Learning high-dimensional probability distributions using tree tensor networks

- Computer ScienceInternational Journal for Uncertainty Quantification
- 2022

We consider the problem of the estimation of a high-dimensional probability distribution using model classes of functions in tree-based tensor formats, a particular case of tensor networks associated…

Evolutionary cross validation

- Computer Science2017 8th International Conference on Information Technology (ICIT)
- 2017

An evolutionary cross validation algorithm for identifying optimal folds in a dataset to improve predictive modeling accuracy is proposed and results of experimental evaluation suggest that the proposed algorithm provides significant improvement against the baseline 10 fold cross validation.

Asymptotic Properties of a Class of Criteria for Best Model Selection

- Mathematics2020 IEEE 15th International Conference on Computer Sciences and Information Technologies (CSIT)
- 2020

The paper investigates the asymptotic convergence of some typical criteria for model selection from a given data sample. A range of known criteria are generalized into a special class joining two…

## References

SHOWING 1-10 OF 94 REFERENCES

Model selection via cross-validation in density estimation, regression, and change-points detection

- Computer Science
- 2008

A fully resampling-based procedure is proposed, which enables to deal with the hard problem of heteroscedasticity, while keeping a reasonable computational complexity.

Nonparametric density estimation by exact leave-p-out cross-validation

- MathematicsComput. Stat. Data Anal.
- 2008

Asymptotics of cross-validated risk estimation in estimator selection and performance assessment

- Mathematics, Computer Science
- 2005

Linear Model Selection by Cross-validation

- Mathematics
- 1993

Abstract We consider the problem of selecting a model having the best predictive ability among a class of linear models. The popular leave-one-out cross-validation method, which is asymptotically…

CONSISTENCY OF CROSS VALIDATION FOR COMPARING REGRESSION PROCEDURES

- Computer Science
- 2007

It is shown that under some conditions, with an appropriate choice of data splitting ratio, cross validation is consistent in the sense of selecting the better procedure with probability approaching 1.

Risk bounds for model selection via penalization

- Mathematics, Computer Science
- 1999

It is shown that the quadratic risk of the minimum penalized empirical contrast estimator is bounded by an index of the accuracy of the sieve, which quantifies the trade-off among the candidate models between the approximation error and parameter dimension relative to sample size.

A leave-p-out based estimation of the proportion of null hypotheses

- Mathematics
- 2008

In the multiple testing context, a challenging problem is the estimation of the proportion �0 of true-null hypotheses. A large number of estimators of this quantity rely on identifiability…

Adaptive Model Selection Using Empirical Complexities

- Computer Science
- 1998

The estimates are shown to achieve a favorable tradeoff between approximation and estimation error, and to perform as well as if the distribution-dependent complexities of the model classes were known beforehand, when each model class has an infinite VC or pseudo dimension.

Model Selection and Error Estimation

- Computer Science, MathematicsMachine Learning
- 2004

A tight relationship between error estimation and data-based complexity penalization is pointed out: any good error estimate may be converted into a data- based penalty function and the performance of the estimate is governed by the quality of the error estimate.

An alternative method of cross-validation for the smoothing of density estimates

- Computer Science
- 1984

An alternative method of cross-validation, based on integrated squared error, recently also proposed by Rudemo (1982), is derived, and Hall (1983) has established the consistency and asymptotic optimality of the new method.