Harmless interpolation of noisy data in regression

@article{Muthukumar2019HarmlessIO,
  title={Harmless interpolation of noisy data in regression},
  author={V. Muthukumar and Kailas Vodrahalli and A. Sahai},
  journal={2019 IEEE International Symposium on Information Theory (ISIT)},
  year={2019},
  pages={2299-2303}
}
A continuing mystery in understanding the empirical success of deep neural networks has been in their ability to achieve zero training error and yet generalize well, even when the training data is noisy and there are more parameters than data points. We investigate this "overparametrization" phenomena in the classical underdetermined linear regression problem, where all solutions that minimize training error interpolate the data, including noise. We give a bound on how well such interpolative… Expand
Towards an Understanding of Benign Overfitting in Neural Networks
TLDR
It is shown that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate, which to this knowledge is the first generalization result for such networks. Expand
Overfitting Can Be Harmless for Basis Pursuit: Only to a Degree
TLDR
To the best of the literature, this is the first result in the literature showing that, without any explicit regularization, the test errors of a practical-to-compute overfitting solution can exhibit double-descent and approach the order of the noise level independently of the null risk. Expand
A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning
The rapid recent progress in machine learning (ML) has raised a number of scientific questions that challenge the longstanding dogma of the field. One of the most important riddles is the goodExpand
The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks
The recent success of neural network models has shone light on a rather surprising statistical phenomenon: statistical models that perfectly fit noisy data can generalize well to unseen test data.Expand
Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting
We consider interpolation learning in high-dimensional linear regression with Gaussian data, and prove a generic uniform convergence guarantee on the generalization error of interpolators in anExpand
Double Descent and Other Interpolation Phenomena in GANs
TLDR
It is shown that overparameterization can improve generalization performance and accelerate the training process, and a new pseudo-supervised learning approach for GANs is developed where the training utilizes pairs of fabricated inputs in conjunction with real output samples. Expand
Finite-sample analysis of interpolating linear classifiers in the overparameterized regime
TLDR
Borders on the population risk of the maximum margin algorithm for two-class linear classification are proved, and it is shown that, with sufficient over-parameterization, this algorithm trained on noisy data can achieve nearly optimal population risk. Expand
Benign overfitting in ridge regression
Classical learning theory suggests that strong regularization is needed to learn a class with large complexity. This intuition is in contrast with the modern practice of machine learning, inExpand
Benign overfitting in linear regression
TLDR
A characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size. Expand
How to make your optimizer generalize better
We study the implicit regularization of optimization methods for linear models interpolating the training data in the under-parametrized and over-parametrized regimes. For over-parameterized linearExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 74 REFERENCES
To understand deep learning we need to understand kernel learning
TLDR
It is argued that progress on understanding deep learning will be difficult until more tractable "shallow" kernel methods are better understood, and a need for new theoretical ideas for understanding properties of classical kernel methods. Expand
Just Interpolate: Kernel "Ridgeless" Regression Can Generalize
TLDR
This work isolates a phenomenon of implicit regularization for minimum-norm interpolated solutions which is due to a combination of high dimensionality of the input data, curvature of the kernel function, and favorable geometric properties of the data such as an eigenvalue decay of the empirical covariance and kernel matrices. Expand
Understanding overfitting peaks in generalization error: Analytical risk curves for l2 and l1 penalized interpolation
  • P. Mitra
  • Computer Science, Physics
  • ArXiv
  • 2019
TLDR
A generative and fitting model pair is introduced and it is shown that the overfitting peak can be dissociated from the point at which the fitting function gains enough dof's to match the data generative model and thus provides good generalization. Expand
Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate
TLDR
A theoretical foundation for interpolated classifiers is taken by analyzing local interpolating schemes, including geometric simplicial interpolation algorithm and singularly weighted $k$-nearest neighbor schemes, and consistency or near-consistency is proved for these schemes in classification and regression problems. Expand
Stochastic Mirror Descent on Overparameterized Nonlinear Models: Convergence, Implicit Regularization, and Generalization
TLDR
This paper shows that in the overparameterized nonlinear setting, SMD with sufficiently small step size converges to a global minimum that is approximately the closest one in Bregman divergence, and experiments indicate that there is a clear difference in the generalization performance of the solutions obtained by different SMD algorithms. Expand
Benign overfitting in linear regression
TLDR
A characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size. Expand
The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve
Deep learning methods operate in regimes that defy the traditional statistical mindset. The neural network architectures often contain more parameters than training samples, and are so rich that theyExpand
The Implicit Bias of Gradient Descent on Separable Data
We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of theExpand
Minimum norm solutions do not always generalize well for over-parameterized problems
TLDR
It is empirically show that the minimum norm solution is not necessarily the proper gauge of good generalization in simplified scenaria, and different models found by adaptive methods could outperform plain gradient methods. Expand
Understanding deep learning requires rethinking generalization
TLDR
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity. Expand
...
1
2
3
4
5
...