Risk bounds for model selection via penalization

  title={Risk bounds for model selection via penalization},
  author={Andrew R. Barron and L. Birge and Pascal Massart},
  journal={Probability Theory and Related Fields},
Abstract Performance bounds for criteria for model selection are developed using recent theory for sieves. The model selection criteria are based on an empirical loss or contrast function with an added penalty term motivated by empirical process theory and roughly proportional to the number of parameters needed to describe the model divided by the number of observations. Most of our examples involve density or regression estimation settings and we focus on the problem of estimating the unknown… 

Risk of penalized least squares, greedy selection andl 1-penalization for flexible function libraries

This paper analyzes performance of penalized least squares estimators with theory of acceptable penalties, such that the estimator optimizing the empirical criterion has risk characterized by a corresponding population property of tradeoff of approximation and penalty relative to the sample size.

Model Selection and Error Estimation

A tight relationship between error estimation and data-based complexity penalization is pointed out: any good error estimate may be converted into a data- based penalty function and the performance of the estimate is governed by the quality of the error estimate.


A model complexity penalty term in AIC is incorporated to handle selec- tion bias and resulting estimators are shown to achieve a trade-off among approxima- tion error, estimation error and model complexity without prior knowledge about the true regression function.

Model Selection for Nonparametric Regression Model Selection for Regression

A model complexity penalty term in AIC is incorporated to handle the selection bias in regression estimation, and resulting estimators are shown to achieve a trade-oo among approximation error, estimation error and model complexity automatically without prior knowledge about the true regression function.

Minimal Penalties for Gaussian Model Selection

A precise analysis of what kind of penalties should be used in order to perform model selection via the minimization of a penalized least-squares type criterion within some general Gaussian framework including the classical ones is mainly devoted.

Gaussian model selection

Abstract.Our purpose in this paper is to provide a general approach to model selection via penalization for Gaussian regression and to develop our point of view about this subject. The advantage and

Adaptive Model Selection Using Empirical Complexities

The estimates are shown to achieve a favorable tradeoff between approximation and estimation error, and to perform as well as if the distribution-dependent complexities of the model classes were known beforehand, when each model class has an infinite VC or pseudo dimension.

Model selection for regression on a random design

We consider the problem of estimating an unknown regression function when the design is random with values in . Our estimation procedure is based on model selection and does not rely on any prior

Minimax nonparametric classification - Part II: Model selection for adaptation

  • Yuhong Yang
  • Mathematics, Computer Science
    IEEE Trans. Inf. Theory
  • 1999
It is shown that with a suitable model selection criterion, the penalized maximum-likelihood estimator has a risk bounded by an index of resolvability expressing a good tradeoff among approximation error, estimation error, and model complexity.

Model selection in density estimation via cross-validation

The problem of model selection by cross-validation is addressed in the density estimation framework. Extensively used in practice, cross-validation (CV) remains poorly understood, especially in the



Model selection for regression on a fixed design

This work considers some collection of finite dimensional linear spaces and the least-squares estimator built on a data driven selected model among this collection and deduce adaptivity properties from which the estimator from which it is derived holds under mild moment conditions on the errors.

Minimum complexity regression estimation with weakly dependent observations

The minimum complexity regression estimation framework, due to Barron, is a general data-driven methodology for estimating a regression function from a given list of parametric models using

On the Estimation of a Probability Density Function by the Maximum Penalized Likelihood Method

Abstract : A class of probability density estimates can be obtained by penalizing the likelihood by a functional which depends on the roughness of the logarithm of the density. The limiting case of

An asymptotic property of model selection criteria

  • Yuhong YangA. Barron
  • Computer Science, Mathematics
    Proceedings of 1994 Workshop on Information Theory and Statistics
  • 1994
The asymptotic risk of the density estimator is determined, under conditions on the penalty term, and is shown to be minimax optimal and the optimal rate of convergence is achieved for the density in certain smooth nonparametric families without knowing the smooth parameters in advance.

Adaptive Spline Estimates for Nonparametric Regression Models

where the are independent standart Gaussian random variables, while the regressors x are deterministic and equally spaced, i.e., x (2i-1)/(2n). We suppose that the unknown function f(.) is

Rates of convergence for minimum contrast estimators

SummaryWe shall present here a general study of minimum contrast estimators in a nonparametric setting (although our results are also valid in the classical parametric case) for independent

Minimum contrast estimators on sieves: exponential bounds and rates of convergence

This paper, which we dedicate to Lucien Le Cam for his seventieth birthday, has been written in the spirit of his pioneering works on the relationships between the metric structure of the parameter

Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications

  • D. Haussler
  • Mathematics, Computer Science
    Inf. Comput.
  • 1992

Minimax risk overlp-balls forlp-error

SummaryConsider estimating the mean vector θ from dataNn(θ,σ2I) withlq norm loss,q≧1, when θ is known to lie in ann-dimensionallp ball,p∈(0, ∞). For largen, the ratio of minimaxlinear risk to minimax

Wavelet Shrinkage: Asymptopia?

A method for curve estimation based on n noisy data: translate the empirical wavelet coefficients towards the origin by an amount √(2 log n) /√n and draw loose parallels with near optimality in robustness and also with the broad near eigenfunction properties of wavelets themselves.