# On the Minimal Error of Empirical Risk Minimization

@article{Kur2021OnTM, title={On the Minimal Error of Empirical Risk Minimization}, author={Gil Kur and Alexander Rakhlin}, journal={ArXiv}, year={2021}, volume={abs/2102.12066} }

We study the minimal error of the Empirical Risk Minimization (ERM) procedure in the task of regression, both in the random and the fixed design settings. Our sharp lower bounds shed light on the possibility (or impossibility) of adapting to simplicity of the model generating the data. In the fixed design setting, we show that the error is governed by the global complexity of the entire class. In contrast, in random design, ERM may only adapt to simpler models if the local neighborhoods around…

## One Citation

### Efficient Minimax Optimal Estimators For Multivariate Convex Regression

- Computer Science, MathematicsCOLT
- 2022

This work is the first to show the existence of efﬁcient minimax optimal estimators for non-Donsker classes that their corresponding Least Squares Estimators are provably minimax sub-optimal ; a result of independent interest.

## References

SHOWING 1-10 OF 39 REFERENCES

### A new perspective on least squares under convex constraint

- Mathematics, Computer Science
- 2014

This paper presents three general results about the problem of estimating the mean of a Gaussian random vector, including an exact computation of the main term in the estimation error by relating it to expected maxima of Gaussian processes, a theorem showing that the least squares estimator is always admissible up to a universal constant in any problem of the above kind and a counterexample showing that least squares estimating may not always be minimax rate-optimal.

### Empirical Processes in M-estimation, volume 6

- Cambridge university press,
- 2000

### Rates of convergence for minimum contrast estimators

- Mathematics
- 1993

SummaryWe shall present here a general study of minimum contrast estimators in a nonparametric setting (although our results are also valid in the classical parametric case) for independent…

### Adaptation in multivariate log-concave density estimation

- MathematicsThe Annals of Statistics
- 2021

We study the adaptation properties of the multivariate log-concave maximum likelihood estimator over two subclasses of log-concave densities. The first consists of densities with polyhedral support…

### Isotonic regression in general dimensions

- MathematicsThe Annals of Statistics
- 2019

We study the least squares regression function estimator over the class of real-valued functions on $[0,1]^d$ that are increasing in each coordinate. For uniformly bounded signals and with a fixed,…

### Mathematical Foundations of Infinite-Dimensional Statistical Models

- Mathematics, Computer Science
- 2015

This chapter discusses nonparametric statistical models, function spaces and approximation theory, and the minimax paradigm, which aims to provide a model for adaptive inference oflihood-based procedures.

### Concentration Inequalities - A Nonasymptotic Theory of Independence

- MathematicsConcentration Inequalities
- 2013

Deep connections with isoperimetric problems are revealed whilst special attention is paid to applications to the supremum of empirical processes.

### On Suboptimality of Least Squares with Application to Estimation of Convex Bodies

- Mathematics, Computer ScienceCOLT
- 2020

It is established that Least Squares is mimimax sub-optimal, and achieves a rate of $\tilde{\Theta}_d (n-2/(d-1)})$ whereas the minimax rate is $\Theta_d(n^{-4/(d+3)})$.

### Convex Regression in Multidimensions: Suboptimality of Least Squares Estimators

- Mathematics
- 2020

The least squares estimator (LSE) is shown to be suboptimal in squared error loss in the usual nonparametric regression model with Gaussian errors for $d \geq 5$ for each of the following families of…

### Benign overfitting in ridge regression

- Computer Science
- 2020

This work provides non-asymptotic generalization bounds for overparametrized ridge regression that depend on the arbitrary covariance structure of the data, and shows that those bounds are tight for a range of regularization parameter values.