# Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting

@article{Mallinar2022BenignTO, title={Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting}, author={Neil Rohit Mallinar and James B. Simon and Amirhesam Abedsoltan and Parthe Pandit and Mikhail Belkin and Preetum Nakkiran}, journal={ArXiv}, year={2022}, volume={abs/2207.06569} }

The practical success of overparameterized neural networks has motivated the recent scientific study of interpolating methods , which perfectly fit their training data. Certain interpolating methods, including neural networks, can fit noisy training data without catastrophically bad test performance, in defiance of standard intuitions from statistical learning theory. Aiming to explain this, a body of recent work has studied benign overfitting , a phenomenon where some interpolating methods…

## 2 Citations

### The Final Ascent: When Bigger Models Generalize Worse on Noisy-Labeled Data

- Computer Science
- 2022

This work shows that under a sufﬁciently large noise-to-sample size ratio, generalization error eventually increases with model size, and empirically observes that the adverse effect of network size is more pronounced when robust training methods are employed to learn from noisy-labeled data.

### Deep Linear Networks can Benignly Overfit when Shallow Ones Do

- Computer ScienceArXiv
- 2022

It is shown that randomly initialized deep linear networks can closely approximate or even match known bounds for the minimum 𝓁 2 -norm interpolant, and it is revealed that interpolating deep linear models have exactly the same conditional variance as the minimum -norm solution.

### The Eigenlearning Framework: A Conservation Law Perspective on Kernel Regression and Wide Neural Networks

- Computer Science
- 2021

A simple unified framework giving closed-form estimates for the test risk and other generalization metrics of kernel ridge regression is derived, enabled by the identification of a sharp conservation law which limits the ability of KRR to learn any orthonormal basis of functions.

### Learning from few examples with nonlinear feature maps

- Computer ScienceArXiv
- 2022

This work considers the problem of data classiﬁcation where the training set consists of just a few data points and reveals key relationships between the geometry of an AI model’s feature space, the structure of the underlying data distributions, and the model's generalisation capabilities.

### Generalizing with overly complex representations

- Psychology
- 2022

Representations enable cognitive systems to generalize from known experiences to the new ones. Simplicity of a representation has been linked to its generalization ability. Conventionally, simple…

## References

SHOWING 1-10 OF 68 REFERENCES

### Benign overfitting in linear regression

- Computer ScienceProceedings of the National Academy of Sciences
- 2020

A characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size.

### Deep learning: a statistical viewpoint

- Computer ScienceActa Numerica
- 2021

This article surveys recent progress in statistical learning theory that provides examples illustrating these principles in simpler settings, and focuses specifically on the linear regime for neural networks, where the network can be approximated by a linear model.

### Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data

- Computer ScienceCOLT
- 2022

This work considers the generalization error of two-layer neural networks trained to interpolation by gradient descent on the logistic loss following random initialization and shows that in this setting, neural networks exhibit benign overﬁtting: they can be driven to zero training error, perfectly matching any noisy training labels, and simultaneously achieve minimax optimal test error.

### Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate

- Computer ScienceNeurIPS
- 2018

A theoretical foundation for interpolated classifiers is taken by analyzing local interpolating schemes, including geometric simplicial interpolation algorithm and singularly weighted $k$-nearest neighbor schemes, and consistency or near-consistency is proved for these schemes in classification and regression problems.

### To understand deep learning we need to understand kernel learning

- Computer ScienceICML
- 2018

It is argued that progress on understanding deep learning will be difficult until more tractable "shallow" kernel methods are better understood, and a need for new theoretical ideas for understanding properties of classical kernel methods.

### The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks

- Computer ScienceArXiv
- 2021

Interpolating two-layer linear neural networks trained with gradient on the squared loss and derive bounds on the excess risk when the covariates satisfy sub-Gaussianity and anti-concentration properties, and the noise is independent and sub- Gaussian.

### Understanding deep learning (still) requires rethinking generalization

- Computer ScienceCommun. ACM
- 2021

These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity.

### Harmless interpolation of noisy data in regression

- Computer Science2019 IEEE International Symposium on Information Theory (ISIT)
- 2019

A bound on how well such interpolative solutions can generalize to fresh test data is given, and it is shown that this bound generically decays to zero with the number of extra features, thus characterizing an explicit benefit of overparameterization.

### Understanding deep learning requires rethinking generalization

- Computer ScienceICLR
- 2017

These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.

### Reconciling modern machine-learning practice and the classical bias–variance trade-off

- Computer ScienceProceedings of the National Academy of Sciences
- 2019

This work shows how classical theory and modern practice can be reconciled within a single unified performance curve and proposes a mechanism underlying its emergence, and provides evidence for the existence and ubiquity of double descent for a wide spectrum of models and datasets.