• Corpus ID: 235458167

# Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting

@inproceedings{Koehler2021UniformCO,
title={Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting},
author={Frederic Koehler and Lijia Zhou and Danica J. Sutherland and Nathan Srebro},
booktitle={NeurIPS},
year={2021}
}
• Published in NeurIPS 17 June 2021
• Computer Science
We consider interpolation learning in high-dimensional linear regression with Gaussian data, and prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class in terms of the class’s Gaussian width. Applying the generic bound to Euclidean norm balls recovers the consistency result of Bartlett et al. (2020) for minimum-norm interpolators, and confirms a prediction of Zhou et al. (2020) for near-minimal-norm interpolators in the…

## Figures from this paper

Tight bounds for minimum l1-norm interpolation of noisy data
• Computer Science
AISTATS
• 2022
This work complements the literature on “benign overﬁtting” for minimum (cid:96) 2 -norm interpolation, where asymptotic consistency can be achieved only when the features are eﬀectively low-dimensional.
Optimistic Rates: A Unifying Theory for Interpolation Learning and Regularization in Linear Regression
• Computer Science
ArXiv
• 2021
The optimistic rate bound is studied for linear regression with Gaussian data to recover some classical statistical guarantees for ridge and LASSO regression under random designs, and helps to obtain a precise understanding of the excess risk of near-interpolators in the over-parameterized regime.
Fast rates for noisy interpolation require rethinking the effects of inductive bias
• Computer Science
ICML
• 2022
This paper proves that minimum p-norm and maximum  p-margin interpolators achieve fast polynomial rates up to order 1/n for p > 1 compared to a logarithmic rate for p = 1 and provides experimental evidence that this trade-off may also play a crucial role in understanding non-linear interpolating models used in practice.
A geometrical viewpoint on the benign overfitting property of the minimum $l_2$-norm interpolant estimator
• Computer Science
• 2022
The Dvoretsky dimension appearing naturally in the authors' geometrical viewpoint coincides with the effective rank from [1, 39] and is the key tool to handle the behavior of the design matrix restricted to the sub-space Vk+1:p where overfitting happens.
Foolish Crowds Support Benign Overfitting
• Computer Science
ArXiv
• 2021
A lower bound is proved on the excess risk of sparse interpolating procedures for linear regression with Gaussian data in the overparameterized regime that implies that its excess risk can converge at an exponentially slower rate than OLS, even when the ground truth is sparse.
Max-Margin Works while Large Margin Fails: Generalization without Uniform Convergence
• Computer Science
ArXiv
• 2022
This analysis provides insight on why memorization can coexist with generalization: in this challenging regime where generalization occurs but UC fails, near-max-margin classifiers simultaneously contain some generalizable components and some overfitting components that memorize the data.
Kernel interpolation in Sobolev spaces is not consistent in low dimensions
Sharp bounds are derived on the spectrum of random kernel matrices using results from the theory of radial basis functions which might be of independent interest for kernels whose associated RKHS is a Sobolev space.
Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data
• Computer Science
COLT
• 2022
This work considers the generalization error of two-layer neural networks trained to interpolation by gradient descent on the logistic loss following random initialization and shows that in this setting, neural networks exhibit benign overﬁtting: they can be driven to zero training error, perfectly matching any noisy training labels, and simultaneously achieve test error close to the Bayes-optimal error.
Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting
• Computer Science
• 2022
This work argues that while benign overfitting has been instructive and fruitful to study, many real interpolating methods like neural networks do not fit benignly : modest noise in the training set causes nonzero excess risk at test time, implying these models are neither benign nor catastrophic but rather fall in an intermediate regime.
A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning
• Computer Science
ArXiv
• 2021
This paper provides a succinct overview of this emerging theory of overparameterized ML (henceforth abbreviated as TOPML) that explains these recent findings through a statistical signal processing perspective and emphasizes the unique aspects that define the TOPML research area as a subfield of modern ML theory.

## References

SHOWING 1-10 OF 44 REFERENCES
On Uniform Convergence and Low-Norm Interpolation Learning
• Computer Science
NeurIPS
• 2020
This work argues it can explain the consistency of the minimal-norm interpolator with a slightly weaker, yet standard, notion, uniform convergence of zero-error predictors, and uses this to bound the generalization error of low- (but not minimal-) norm interpolating predictors.
Exact Gap between Generalization Error and Uniform Convergence in Random Feature Models
• Mathematics, Computer Science
ICML
• 2021
This work shows that, in the setting where the classical uniform convergence bound is vacuous, uniform convergence over the interpolators still gives a non-trivial bound of the test error of interpolating solutions, which provides a first exact comparison between the test errors and uniform convergence bounds for interpolators beyond simple linear models.
Failures of model-dependent generalization bounds for least-norm interpolation
• Computer Science
J. Mach. Learn. Res.
• 2021
Any valid generalization bound of a type that is commonly proved in statistical learning theory must sometimes be very loose when applied to analyze the least-norm interpolant, for a variety of natural joint distributions on training examples.
Harmless interpolation of noisy data in regression
• Computer Science
2019 IEEE International Symposium on Information Theory (ISIT)
• 2019
A bound on how well such interpolative solutions can generalize to fresh test data is given, and it is shown that this bound generically decays to zero with the number of extra features, thus characterizing an explicit benefit of overparameterization.
In Defense of Uniform Convergence: Generalization via derandomization with an application to interpolating predictors
• Computer Science, Mathematics
ICML
• 2020
The generalization error of a learned predictor $\hat h$ is studied in terms of that of a surrogate (potentially randomized) predictor that is coupled to h and designed to trade empirical risk for control of generalizationerror.
Uniform convergence may be unable to explain generalization in deep learning
• Computer Science
NeurIPS
• 2019
Through numerous experiments, doubt is cast on the power of uniform convergence-based generalization bounds to provide a complete picture of why overparameterized deep networks generalize well.
Benign overfitting in ridge regression
• Computer Science
• 2020
This work provides non-asymptotic generalization bounds for overparametrized ridge regression that depend on the arbitrary covariance structure of the data, and shows that those bounds are tight for a range of regularization parameter values.
Surprises in High-Dimensional Ridgeless Least Squares Interpolation
• Computer Science
The Annals of Statistics
• 2022
This paper recovers---in a precise quantitative way---several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.
Regularized Linear Regression: A Precise Analysis of the Estimation Error
• Computer Science
COLT
• 2015
This paper focuses on the problem of linear regression and considers a general class of optimization methods that minimize a loss function measuring the misfit of the model to the observations with an added structured-inducing regularization term.
Benign overfitting in linear regression
• Computer Science
Proceedings of the National Academy of Sciences
• 2020
A characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size.