• Corpus ID: 235458167

Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting

@inproceedings{Koehler2021UniformCO,
  title={Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting},
  author={Frederic Koehler and Lijia Zhou and Danica J. Sutherland and Nathan Srebro},
  booktitle={NeurIPS},
  year={2021}
}
We consider interpolation learning in high-dimensional linear regression with Gaussian data, and prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class in terms of the class’s Gaussian width. Applying the generic bound to Euclidean norm balls recovers the consistency result of Bartlett et al. (2020) for minimum-norm interpolators, and confirms a prediction of Zhou et al. (2020) for near-minimal-norm interpolators in the… 

Figures from this paper

Tight bounds for minimum l1-norm interpolation of noisy data
TLDR
This work complements the literature on “benign overfitting” for minimum (cid:96) 2 -norm interpolation, where asymptotic consistency can be achieved only when the features are effectively low-dimensional.
Optimistic Rates: A Unifying Theory for Interpolation Learning and Regularization in Linear Regression
TLDR
The optimistic rate bound is studied for linear regression with Gaussian data to recover some classical statistical guarantees for ridge and LASSO regression under random designs, and helps to obtain a precise understanding of the excess risk of near-interpolators in the over-parameterized regime.
Fast rates for noisy interpolation require rethinking the effects of inductive bias
TLDR
This paper proves that minimum `p-norm and maximum ` p-margin interpolators achieve fast polynomial rates up to order 1/n for p > 1 compared to a logarithmic rate for p = 1 and provides experimental evidence that this trade-off may also play a crucial role in understanding non-linear interpolating models used in practice.
A geometrical viewpoint on the benign overfitting property of the minimum $l_2$-norm interpolant estimator
TLDR
The Dvoretsky dimension appearing naturally in the authors' geometrical viewpoint coincides with the effective rank from [1, 39] and is the key tool to handle the behavior of the design matrix restricted to the sub-space Vk+1:p where overfitting happens.
Foolish Crowds Support Benign Overfitting
TLDR
A lower bound is proved on the excess risk of sparse interpolating procedures for linear regression with Gaussian data in the overparameterized regime that implies that its excess risk can converge at an exponentially slower rate than OLS, even when the ground truth is sparse.
Max-Margin Works while Large Margin Fails: Generalization without Uniform Convergence
TLDR
This analysis provides insight on why memorization can coexist with generalization: in this challenging regime where generalization occurs but UC fails, near-max-margin classifiers simultaneously contain some generalizable components and some overfitting components that memorize the data.
Kernel interpolation in Sobolev spaces is not consistent in low dimensions
TLDR
Sharp bounds are derived on the spectrum of random kernel matrices using results from the theory of radial basis functions which might be of independent interest for kernels whose associated RKHS is a Sobolev space.
Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data
TLDR
This work considers the generalization error of two-layer neural networks trained to interpolation by gradient descent on the logistic loss following random initialization and shows that in this setting, neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly matching any noisy training labels, and simultaneously achieve test error close to the Bayes-optimal error.
Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting
TLDR
This work argues that while benign overfitting has been instructive and fruitful to study, many real interpolating methods like neural networks do not fit benignly : modest noise in the training set causes nonzero excess risk at test time, implying these models are neither benign nor catastrophic but rather fall in an intermediate regime.
A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning
TLDR
This paper provides a succinct overview of this emerging theory of overparameterized ML (henceforth abbreviated as TOPML) that explains these recent findings through a statistical signal processing perspective and emphasizes the unique aspects that define the TOPML research area as a subfield of modern ML theory.
...
...

References

SHOWING 1-10 OF 44 REFERENCES
On Uniform Convergence and Low-Norm Interpolation Learning
TLDR
This work argues it can explain the consistency of the minimal-norm interpolator with a slightly weaker, yet standard, notion, uniform convergence of zero-error predictors, and uses this to bound the generalization error of low- (but not minimal-) norm interpolating predictors.
Exact Gap between Generalization Error and Uniform Convergence in Random Feature Models
TLDR
This work shows that, in the setting where the classical uniform convergence bound is vacuous, uniform convergence over the interpolators still gives a non-trivial bound of the test error of interpolating solutions, which provides a first exact comparison between the test errors and uniform convergence bounds for interpolators beyond simple linear models.
Failures of model-dependent generalization bounds for least-norm interpolation
TLDR
Any valid generalization bound of a type that is commonly proved in statistical learning theory must sometimes be very loose when applied to analyze the least-norm interpolant, for a variety of natural joint distributions on training examples.
Harmless interpolation of noisy data in regression
TLDR
A bound on how well such interpolative solutions can generalize to fresh test data is given, and it is shown that this bound generically decays to zero with the number of extra features, thus characterizing an explicit benefit of overparameterization.
In Defense of Uniform Convergence: Generalization via derandomization with an application to interpolating predictors
TLDR
The generalization error of a learned predictor $\hat h$ is studied in terms of that of a surrogate (potentially randomized) predictor that is coupled to h and designed to trade empirical risk for control of generalizationerror.
Uniform convergence may be unable to explain generalization in deep learning
TLDR
Through numerous experiments, doubt is cast on the power of uniform convergence-based generalization bounds to provide a complete picture of why overparameterized deep networks generalize well.
Benign overfitting in ridge regression
TLDR
This work provides non-asymptotic generalization bounds for overparametrized ridge regression that depend on the arbitrary covariance structure of the data, and shows that those bounds are tight for a range of regularization parameter values.
Surprises in High-Dimensional Ridgeless Least Squares Interpolation
TLDR
This paper recovers---in a precise quantitative way---several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.
Regularized Linear Regression: A Precise Analysis of the Estimation Error
TLDR
This paper focuses on the problem of linear regression and considers a general class of optimization methods that minimize a loss function measuring the misfit of the model to the observations with an added structured-inducing regularization term.
Benign overfitting in linear regression
TLDR
A characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size.
...
...