• Corpus ID: 223974184

Failures of model-dependent generalization bounds for least-norm interpolation

@article{Bartlett2021FailuresOM,
  title={Failures of model-dependent generalization bounds for least-norm interpolation},
  author={Peter L. Bartlett and Philip M. Long},
  journal={J. Mach. Learn. Res.},
  year={2021},
  volume={22},
  pages={204:1-204:15}
}
We consider bounds on the generalization performance of the least-norm linear regressor, in the over-parameterized regime where it can interpolate the data. We describe a sense in which any generalization bound of a type that is commonly proved in statistical learning theory must sometimes be very loose when applied to analyze the least-norm interpolant. In particular, for a variety of natural joint distributions on training examples, any valid generalization bound that depends only on the… 
Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting
We consider interpolation learning in high-dimensional linear regression with Gaussian data, and prove a generic uniform convergence guarantee on the generalization error of interpolators in an
Benign Underfitting of SGD in Stochastic Convex Optimization
TLDR
It turns out that SGD is not algorithmically stable in any sense, and its generalization ability cannot be explained by uniform convergence or any other currently known generalization bound technique for that matter (other than that of its classical analysis).
Benign Underfitting of Stochastic Gradient Descent
TLDR
It turns out that SGD is not algorithmically stable in any sense, and its generalization ability cannot be explained by uniform convergence or any other currently known generalization bound technique for that matter (other than that of its classical analysis).
The Implicit Bias of Benign Overfitting
TLDR
This paper proposes a prototypical and rather generic data model for benign overfitting of linear predictors, where an arbitrary input distribution of some fixed dimension k is concatenated with a high-dimensional distribution and proves that the max-margin predictor is asymptotically biased towards minimizing a weighted squared hinge loss.
A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning
TLDR
This paper provides a succinct overview of this emerging theory of overparameterized ML (henceforth abbreviated as TOPML) that explains these recent findings through a statistical signal processing perspective and emphasizes the unique aspects that define the TOPML research area as a subfield of modern ML theory.
Benign Overfitting in Multiclass Classification: All Roads Lead to Interpolation
TLDR
This analysis shows that good generalization is possible for SVM solutions beyond the realm in which typical margin-based bounds apply, and derives novel error bounds on the accuracy of the MNI classifier.
Classification and Adversarial examples in an Overparameterized Linear Model: A Signal Processing Perspective
TLDR
An overparameterized linear ensemble that uses the “lifted” Fourier feature map, that demonstrates both adversarial susceptibility and classification with these features can be easier than the more commonly studied “independent feature” models.
Deep learning theory lecture notes
Preface 2 Basic setup: feedforward networks and test error decomposition . . . . . . . . . . . . . . . 4 Highlights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Exact Gap between Generalization Error and Uniform Convergence in Random Feature Models
TLDR
This work shows that, in the setting where the classical uniform convergence bound is vacuous, uniform convergence over the interpolators still gives a non-trivial bound of the test error of interpolating solutions, which provides a first exact comparison between the test errors and uniform convergence bounds for interpolators beyond simple linear models.
Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation
TLDR
Just as a physical prism separates colours mixed within a ray of light, the figurative prism of interpolation helps to disentangle generalization and optimization properties within the complex picture of modern machine learning.
...
1
2
...

References

SHOWING 1-10 OF 45 REFERENCES
Does learning require memorization? a short tale about a long tail
TLDR
The model allows to quantify the effect of not fitting the training data on the generalization performance of the learned classifier and demonstrates that memorization is necessary whenever frequencies are long-tailed, and establishes a formal link between these empirical phenomena.
Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks
TLDR
The expected $0$-$1$ loss of a wide enough ReLU network trained with stochastic gradient descent and random initialization can be bounded by the training Loss of a random feature model induced by the network gradient at initialization, which is called a neural tangent random feature (NTRF) model.
Testing that distributions are close
TLDR
A sublinear algorithm which uses O(n/sup 2/3//spl epsiv//sup -4/ log n) independent samples from each distribution, runs in time linear in the sample size, makes no assumptions about the structure of the distributions, and distinguishes the cases when the distance between the distributions is small or large.
Benign overfitting in linear regression
TLDR
A characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size.
Spectrally-normalized margin bounds for neural networks
TLDR
This bound is empirically investigated for a standard AlexNet network trained with SGD on the mnist and cifar10 datasets, with both original and random labels; the bound, the Lipschitz constants, and the excess risks are all in direct correlation, suggesting both that SGD selects predictors whose complexity scales with the difficulty of the learning task, and that the presented bound is sensitive to this complexity.
Surprises in High-Dimensional Ridgeless Least Squares Interpolation
TLDR
This paper recovers---in a precise quantitative way---several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.
Benign Overfitting and Noisy Features
TLDR
The noise which resides in random feature features plays an important implicit regularization role in the phenomenon of overfitting and is adopted in a new view of random feature models.
Benign overfitting in ridge regression
TLDR
This work provides non-asymptotic generalization bounds for overparametrized ridge regression that depend on the arbitrary covariance structure of the data, and shows that those bounds are tight for a range of regularization parameter values.
Exact expressions for double descent and implicit regularization via surrogate random design
TLDR
This work provides the first exact non-asymptotic expressions for double descent of the minimum norm linear estimator and introduces a new mathematical tool of independent interest: the class of random matrices for which determinant commutes with expectation.
Generalization bounds for deep convolutional neural networks
TLDR
Borders on the generalization error of convolutional networks are proved in terms of the training loss, the number of parameters, the Lipschitz constant of the loss and the distance from the weights to the initial weights.
...
1
2
3
4
5
...