Optimal cross-validation in density estimation with the $L^{2}$-loss

  title={Optimal cross-validation in density estimation with the \$L^\{2\}\$-loss},
  author={Alain Celisse},
  journal={Annals of Statistics},
We analyze the performance of cross-validation (CV) in the density estimation framework with two purposes: (i) risk estimation and (ii) model selection. The main focus is given to the so-called leave-$p$-out CV procedure (Lpo), where $p$ denotes the cardinality of the test set. Closed-form expressions are settled for the Lpo estimator of the risk of projection estimators. These expressions provide a great improvement upon $V$-fold cross-validation in terms of variability and computational… Expand
Choice of V for V-Fold Cross-Validation in Least-Squares Density Estimation
A non-asymptotic oracle inequality is proved for V-fold cross-validation and its bias-corrected version (V-fold penalization), implying that V- fold penalization is asymptotically optimal in the nonparametric case. Expand
Estimating the Kullback–Liebler risk based on multifold cross-validation
type="main" xml:id="stan12070-abs-0001"> This paper concerns a class of model selection criteria based on cross-validation techniques and estimative predictive densities. Both the simple orExpand
An efficient variance estimator for cross-validation under partition sampling
  • Qing Wang, Xizhen Cai
  • Mathematics
  • Statistics
  • 2021
This paper concerns the problem of variance estimation of cross-validation. We consider the unbiased cross-validation risk estimate in the form of a general U-statistic and focus on estimating theExpand
Local asymptotics of cross-validation in least-squares density estimation
In model selection, several types of cross-validation are commonly used and many variants have been introduced. While consistency of some of these methods has been proven, their rate of convergenceExpand
Contributions à la calibration d'algorithmes d'apprentissage : Validation-croisée et détection de ruptures
The optimality of the LpO-based model selection procedure is proved under some condi- tions both in the estimation purpose—by means of a non-asymptotic oracle inequality—and in the identification purpose—through a model consistency result. Expand
Targeted Cross-Validation
This work proposes a targeted cross-validation (TCV) to select models or procedures based on a general weighted L2 loss and shows that the TCV is consistent in selecting the best performing candidate under the weighted L1 loss. Expand
Bias-aware model selection for machine learning of doubly robust functionals
An oracle property is established for a multi-fold cross-validation version of the new model selection criteria which states that the empirical criteria perform nearly as well as an oracle with a priori knowledge of the pseudo-risk for each candidate model. Expand
Learning high-dimensional probability distributions using tree tensor networks
We consider the problem of the estimation of a high-dimensional probability distribution using model classes of functions in tree-based tensor formats, a particular case of tensor networks associatedExpand
A concentration inequality for the excess risk in least-squares regression with random design and heteroscedastic noise
We prove a new and general concentration inequality for the excess risk in least-squares regression with random design and heteroscedastic noise. No specific structure is required on the model,Expand
Evolutionary cross validation
An evolutionary cross validation algorithm for identifying optimal folds in a dataset to improve predictive modeling accuracy is proposed and results of experimental evaluation suggest that the proposed algorithm provides significant improvement against the baseline 10 fold cross validation. Expand


Model selection via cross-validation in density estimation, regression, and change-points detection
In this thesis, we aim at studying a family of resampling algorithms, referred to as cross-validation, and especially of one of them named leave-$p$-out. Extensively used in practice, theseExpand
Nonparametric density estimation by exact leave-p-out cross-validation
The problem of density estimation is addressed by minimization of the L^2-risk for both histogram and kernel estimators. This quadratic risk is estimated by leave-p-out cross-validation (LPO), whichExpand
Asymptotics of cross-validated risk estimation in estimator selection and performance assessment
Abstract Risk estimation is an important statistical question for the purposes of selecting a good estimator (i.e., model selection) and assessing its performance (i.e., estimating generalizationExpand
Linear Model Selection by Cross-validation
Abstract We consider the problem of selecting a model having the best predictive ability among a class of linear models. The popular leave-one-out cross-validation method, which is asymptoticallyExpand
Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finite-dimensional models (e.g., subset or order selection in linear regression) or selecting aExpand
Risk bounds for model selection via penalization
Abstract Performance bounds for criteria for model selection are developed using recent theory for sieves. The model selection criteria are based on an empirical loss or contrast function with anExpand
A leave-p-out based estimation of the proportion of null hypotheses
In the multiple testing context, a challenging problem is the estimation of the proportion �0 of true-null hypotheses. A large number of estimators of this quantity rely on identifiabilityExpand
Adaptive Model Selection Using Empirical Complexities
Given n independent replicates of a jointly distributed pair (X,Y) in Rd times R, we wish to select from a fixed sequence of model classes F1,F2... a deterministic prediction rule f: Rd to R whoseExpand
Model Selection and Error Estimation
A tight relationship between error estimation and data-based complexity penalization is pointed out: any good error estimate may be converted into a data- based penalty function and the performance of the estimate is governed by the quality of the error estimate. Expand
An alternative method of cross-validation for the smoothing of density estimates
Cross-validation with Kullback-Leibler loss function has been applied to the choice of a smoothing parameter in the kernel method of density estimation. A framework for this problem is constructedExpand