# Failures of model-dependent generalization bounds for least-norm interpolation

@article{Bartlett2021FailuresOM, title={Failures of model-dependent generalization bounds for least-norm interpolation}, author={Peter L. Bartlett and Philip M. Long}, journal={J. Mach. Learn. Res.}, year={2021}, volume={22}, pages={204:1-204:15} }

We consider bounds on the generalization performance of the least-norm linear regressor, in the over-parameterized regime where it can interpolate the data. We describe a sense in which any generalization bound of a type that is commonly proved in statistical learning theory must sometimes be very loose when applied to analyze the least-norm interpolant. In particular, for a variety of natural joint distributions on training examples, any valid generalization bound that depends only on the…

## 14 Citations

Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting

- Computer ScienceNeurIPS
- 2021

We consider interpolation learning in high-dimensional linear regression with Gaussian data, and prove a generic uniform convergence guarantee on the generalization error of interpolators in an…

Benign Underfitting of SGD in Stochastic Convex Optimization

- Computer Science
- 2022

It turns out that SGD is not algorithmically stable in any sense, and its generalization ability cannot be explained by uniform convergence or any other currently known generalization bound technique for that matter (other than that of its classical analysis).

Benign Underfitting of Stochastic Gradient Descent

- Computer ScienceArXiv
- 2022

It turns out that SGD is not algorithmically stable in any sense, and its generalization ability cannot be explained by uniform convergence or any other currently known generalization bound technique for that matter (other than that of its classical analysis).

The Implicit Bias of Benign Overfitting

- Computer ScienceArXiv
- 2022

This paper proposes a prototypical and rather generic data model for benign overﬁtting of linear predictors, where an arbitrary input distribution of some ﬁxed dimension k is concatenated with a high-dimensional distribution and proves that the max-margin predictor is asymptotically biased towards minimizing a weighted squared hinge loss.

A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning

- Computer ScienceArXiv
- 2021

This paper provides a succinct overview of this emerging theory of overparameterized ML (henceforth abbreviated as TOPML) that explains these recent findings through a statistical signal processing perspective and emphasizes the unique aspects that define the TOPML research area as a subfield of modern ML theory.

Benign Overfitting in Multiclass Classification: All Roads Lead to Interpolation

- Computer ScienceNeurIPS
- 2021

This analysis shows that good generalization is possible for SVM solutions beyond the realm in which typical margin-based bounds apply, and derives novel error bounds on the accuracy of the MNI classifier.

Classification and Adversarial examples in an Overparameterized Linear Model: A Signal Processing Perspective

- Computer ScienceArXiv
- 2021

An overparameterized linear ensemble that uses the “lifted” Fourier feature map, that demonstrates both adversarial susceptibility and classification with these features can be easier than the more commonly studied “independent feature” models.

Deep learning theory lecture notes

- Physics
- 2021

Preface 2 Basic setup: feedforward networks and test error decomposition . . . . . . . . . . . . . . . 4 Highlights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .…

Exact Gap between Generalization Error and Uniform Convergence in Random Feature Models

- Mathematics, Computer ScienceICML
- 2021

This work shows that, in the setting where the classical uniform convergence bound is vacuous, uniform convergence over the interpolators still gives a non-trivial bound of the test error of interpolating solutions, which provides a first exact comparison between the test errors and uniform convergence bounds for interpolators beyond simple linear models.

Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation

- Computer ScienceActa Numerica
- 2021

Just as a physical prism separates colours mixed within a ray of light, the figurative prism of interpolation helps to disentangle generalization and optimization properties within the complex picture of modern machine learning.

## References

SHOWING 1-10 OF 45 REFERENCES

Does learning require memorization? a short tale about a long tail

- Computer ScienceSTOC
- 2020

The model allows to quantify the effect of not fitting the training data on the generalization performance of the learned classifier and demonstrates that memorization is necessary whenever frequencies are long-tailed, and establishes a formal link between these empirical phenomena.

Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

- Computer ScienceNeurIPS
- 2019

The expected $0$-$1$ loss of a wide enough ReLU network trained with stochastic gradient descent and random initialization can be bounded by the training Loss of a random feature model induced by the network gradient at initialization, which is called a neural tangent random feature (NTRF) model.

Testing that distributions are close

- Computer ScienceProceedings 41st Annual Symposium on Foundations of Computer Science
- 2000

A sublinear algorithm which uses O(n/sup 2/3//spl epsiv//sup -4/ log n) independent samples from each distribution, runs in time linear in the sample size, makes no assumptions about the structure of the distributions, and distinguishes the cases when the distance between the distributions is small or large.

Benign overfitting in linear regression

- Computer ScienceProceedings of the National Academy of Sciences
- 2020

A characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size.

Spectrally-normalized margin bounds for neural networks

- Computer ScienceNIPS
- 2017

This bound is empirically investigated for a standard AlexNet network trained with SGD on the mnist and cifar10 datasets, with both original and random labels; the bound, the Lipschitz constants, and the excess risks are all in direct correlation, suggesting both that SGD selects predictors whose complexity scales with the difficulty of the learning task, and that the presented bound is sensitive to this complexity.

Surprises in High-Dimensional Ridgeless Least Squares Interpolation

- Computer ScienceThe Annals of Statistics
- 2022

This paper recovers---in a precise quantitative way---several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.

Benign Overfitting and Noisy Features

- Computer ScienceArXiv
- 2020

The noise which resides in random feature features plays an important implicit regularization role in the phenomenon of overfitting and is adopted in a new view of random feature models.

Benign overfitting in ridge regression

- Computer Science
- 2020

This work provides non-asymptotic generalization bounds for overparametrized ridge regression that depend on the arbitrary covariance structure of the data, and shows that those bounds are tight for a range of regularization parameter values.

Exact expressions for double descent and implicit regularization via surrogate random design

- Computer Science, MathematicsNeurIPS
- 2020

This work provides the first exact non-asymptotic expressions for double descent of the minimum norm linear estimator and introduces a new mathematical tool of independent interest: the class of random matrices for which determinant commutes with expectation.

Generalization bounds for deep convolutional neural networks

- Computer ScienceICLR
- 2020

Borders on the generalization error of convolutional networks are proved in terms of the training loss, the number of parameters, the Lipschitz constant of the loss and the distance from the weights to the initial weights.