Corpus ID: 235458167

# Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting

@article{Koehler2021UniformCO,
title={Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting},
author={Frederic Koehler and Lijia Zhou and Danica J. Sutherland and Nathan Srebro},
journal={ArXiv},
year={2021},
volume={abs/2106.09276}
}
We consider interpolation learning in high-dimensional linear regression with Gaussian data, and prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class in terms of the class’s Gaussian width. Applying the generic bound to Euclidean norm balls recovers the consistency result of Bartlett et al. (2020) for minimum-norm interpolators, and confirms a prediction of Zhou et al. (2020) for near-minimal-norm interpolators in the… Expand
2 Citations

#### Figures from this paper

A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning
• Computer Science, Mathematics
• ArXiv
• 2021
The rapid recent progress in machine learning (ML) has raised a number of scientific questions that challenge the longstanding dogma of the field. One of the most important riddles is the goodExpand
The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks
• Computer Science, Mathematics
• ArXiv
• 2021
The recent success of neural network models has shone light on a rather surprising statistical phenomenon: statistical models that perfectly fit noisy data can generalize well to unseen test data.Expand

#### References

SHOWING 1-10 OF 31 REFERENCES
Harmless interpolation of noisy data in regression
• Computer Science, Mathematics
• 2019 IEEE International Symposium on Information Theory (ISIT)
• 2019
A bound on how well such interpolative solutions can generalize to fresh test data is given, and it is shown that this bound generically decays to zero with the number of extra features, thus characterizing an explicit benefit of overparameterization. Expand
In Defense of Uniform Convergence: Generalization via derandomization with an application to interpolating predictors
• Computer Science, Mathematics
• ICML
• 2020
The generalization error of a learned predictor $\hat h$ is studied in terms of that of a surrogate (potentially randomized) predictor that is coupled to h and designed to trade empirical risk for control of generalizationerror. Expand
Uniform convergence may be unable to explain generalization in deep learning
• Computer Science, Mathematics
• NeurIPS
• 2019
Through numerous experiments, doubt is cast on the power of uniform convergence-based generalization bounds to provide a complete picture of why overparameterized deep networks generalize well. Expand
Surprises in High-Dimensional Ridgeless Least Squares Interpolation
• Mathematics, Computer Science
• ArXiv
• 2019
This paper recovers---in a precise quantitative way---several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization. Expand
Benign overfitting in linear regression
• Computer Science, Mathematics
• Proceedings of the National Academy of Sciences
• 2020
A characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size. Expand
Learning without Concentration
We obtain sharp bounds on the estimation error of the Empirical Risk Minimization procedure, performed in a convex class and with respect to the squared loss, without assuming that class members andExpand
A Model of Double Descent for High-dimensional Binary Linear Classification
• Mathematics, Computer Science
• ArXiv
• 2019
A model for logistic regression where only a subset of features of size p is used for training a linear classifier over n training samples is considered, and a phase-transition phenomenon for the case of Gaussian regressors is uncovered. Expand
Overfitting Can Be Harmless for Basis Pursuit: Only to a Degree
• Computer Science, Mathematics
• ArXiv
• 2020
To the best of the literature, this is the first result in the literature showing that, without any explicit regularization, the test errors of a practical-to-compute overfitting solution can exhibit double-descent and approach the order of the noise level independently of the null risk. Expand
The Convex Geometry of Linear Inverse Problems
• Mathematics, Computer Science
• Found. Comput. Math.
• 2012
This paper provides a general framework to convert notions of simplicity into convex penalty functions, resulting in convex optimization solutions to linear, underdetermined inverse problems. Expand
Reconciling modern machine learning practice and the bias-variance trade-off
• Computer Science
• 2018
This paper reconciles the classical understanding and the modern practice within a unified performance curve that subsumes the textbook U-shaped bias-variance trade-off curve by showing how increasing model capacity beyond the point of interpolation results in improved performance. Expand