# Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting

@inproceedings{Koehler2021UniformCO, title={Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting}, author={Frederic Koehler and Lijia Zhou and Danica J. Sutherland and Nathan Srebro}, booktitle={NeurIPS}, year={2021} }

We consider interpolation learning in high-dimensional linear regression with Gaussian data, and prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class in terms of the class’s Gaussian width. Applying the generic bound to Euclidean norm balls recovers the consistency result of Bartlett et al. (2020) for minimum-norm interpolators, and confirms a prediction of Zhou et al. (2020) for near-minimal-norm interpolators in the…

## Figures from this paper

## 14 Citations

Tight bounds for minimum l1-norm interpolation of noisy data

- Computer ScienceAISTATS
- 2022

This work complements the literature on “benign overﬁtting” for minimum (cid:96) 2 -norm interpolation, where asymptotic consistency can be achieved only when the features are eﬀectively low-dimensional.

Optimistic Rates: A Unifying Theory for Interpolation Learning and Regularization in Linear Regression

- Computer ScienceArXiv
- 2021

The optimistic rate bound is studied for linear regression with Gaussian data to recover some classical statistical guarantees for ridge and LASSO regression under random designs, and helps to obtain a precise understanding of the excess risk of near-interpolators in the over-parameterized regime.

Fast rates for noisy interpolation require rethinking the effects of inductive bias

- Computer ScienceICML
- 2022

This paper proves that minimum `p-norm and maximum ` p-margin interpolators achieve fast polynomial rates up to order 1/n for p > 1 compared to a logarithmic rate for p = 1 and provides experimental evidence that this trade-off may also play a crucial role in understanding non-linear interpolating models used in practice.

A geometrical viewpoint on the benign overfitting property of the minimum $l_2$-norm interpolant estimator

- Computer Science
- 2022

The Dvoretsky dimension appearing naturally in the authors' geometrical viewpoint coincides with the effective rank from [1, 39] and is the key tool to handle the behavior of the design matrix restricted to the sub-space Vk+1:p where overfitting happens.

Foolish Crowds Support Benign Overfitting

- Computer ScienceArXiv
- 2021

A lower bound is proved on the excess risk of sparse interpolating procedures for linear regression with Gaussian data in the overparameterized regime that implies that its excess risk can converge at an exponentially slower rate than OLS, even when the ground truth is sparse.

Max-Margin Works while Large Margin Fails: Generalization without Uniform Convergence

- Computer ScienceArXiv
- 2022

This analysis provides insight on why memorization can coexist with generalization: in this challenging regime where generalization occurs but UC fails, near-max-margin classifiers simultaneously contain some generalizable components and some overfitting components that memorize the data.

Kernel interpolation in Sobolev spaces is not consistent in low dimensions

- Computer Science, MathematicsCOLT
- 2022

Sharp bounds are derived on the spectrum of random kernel matrices using results from the theory of radial basis functions which might be of independent interest for kernels whose associated RKHS is a Sobolev space.

Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data

- Computer ScienceCOLT
- 2022

This work considers the generalization error of two-layer neural networks trained to interpolation by gradient descent on the logistic loss following random initialization and shows that in this setting, neural networks exhibit benign overﬁtting: they can be driven to zero training error, perfectly matching any noisy training labels, and simultaneously achieve test error close to the Bayes-optimal error.

Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting

- Computer Science
- 2022

This work argues that while benign overfitting has been instructive and fruitful to study, many real interpolating methods like neural networks do not fit benignly : modest noise in the training set causes nonzero excess risk at test time, implying these models are neither benign nor catastrophic but rather fall in an intermediate regime.

A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning

- Computer ScienceArXiv
- 2021

This paper provides a succinct overview of this emerging theory of overparameterized ML (henceforth abbreviated as TOPML) that explains these recent findings through a statistical signal processing perspective and emphasizes the unique aspects that define the TOPML research area as a subfield of modern ML theory.

## References

SHOWING 1-10 OF 44 REFERENCES

On Uniform Convergence and Low-Norm Interpolation Learning

- Computer ScienceNeurIPS
- 2020

This work argues it can explain the consistency of the minimal-norm interpolator with a slightly weaker, yet standard, notion, uniform convergence of zero-error predictors, and uses this to bound the generalization error of low- (but not minimal-) norm interpolating predictors.

Exact Gap between Generalization Error and Uniform Convergence in Random Feature Models

- Mathematics, Computer ScienceICML
- 2021

This work shows that, in the setting where the classical uniform convergence bound is vacuous, uniform convergence over the interpolators still gives a non-trivial bound of the test error of interpolating solutions, which provides a first exact comparison between the test errors and uniform convergence bounds for interpolators beyond simple linear models.

Failures of model-dependent generalization bounds for least-norm interpolation

- Computer ScienceJ. Mach. Learn. Res.
- 2021

Any valid generalization bound of a type that is commonly proved in statistical learning theory must sometimes be very loose when applied to analyze the least-norm interpolant, for a variety of natural joint distributions on training examples.

Harmless interpolation of noisy data in regression

- Computer Science2019 IEEE International Symposium on Information Theory (ISIT)
- 2019

A bound on how well such interpolative solutions can generalize to fresh test data is given, and it is shown that this bound generically decays to zero with the number of extra features, thus characterizing an explicit benefit of overparameterization.

In Defense of Uniform Convergence: Generalization via derandomization with an application to interpolating predictors

- Computer Science, MathematicsICML
- 2020

The generalization error of a learned predictor $\hat h$ is studied in terms of that of a surrogate (potentially randomized) predictor that is coupled to h and designed to trade empirical risk for control of generalizationerror.

Uniform convergence may be unable to explain generalization in deep learning

- Computer ScienceNeurIPS
- 2019

Through numerous experiments, doubt is cast on the power of uniform convergence-based generalization bounds to provide a complete picture of why overparameterized deep networks generalize well.

Benign overfitting in ridge regression

- Computer Science
- 2020

This work provides non-asymptotic generalization bounds for overparametrized ridge regression that depend on the arbitrary covariance structure of the data, and shows that those bounds are tight for a range of regularization parameter values.

Surprises in High-Dimensional Ridgeless Least Squares Interpolation

- Computer ScienceThe Annals of Statistics
- 2022

This paper recovers---in a precise quantitative way---several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.

Regularized Linear Regression: A Precise Analysis of the Estimation Error

- Computer ScienceCOLT
- 2015

This paper focuses on the problem of linear regression and considers a general class of optimization methods that minimize a loss function measuring the misfit of the model to the observations with an added structured-inducing regularization term.

Benign overfitting in linear regression

- Computer ScienceProceedings of the National Academy of Sciences
- 2020

A characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size.