• Corpus ID: 247158139

# Benign Underfitting of Stochastic Gradient Descent

@article{Koren2022BenignUO,
title={Benign Underfitting of Stochastic Gradient Descent},
author={Tomer Koren and Roi Livni and Y. Mansour and Uri Sherman},
journal={ArXiv},
year={2022},
volume={abs/2202.13361}
}
• Published 27 February 2022
• Computer Science
• ArXiv
We study to what extent may stochastic gradient descent (SGD) be understood as a “conventional” learning rule that achieves generalization performance by obtaining a good ﬁt to training data. We consider the fundamental stochastic convex optimization framework, where (one pass, without -replacement) SGD is classically known to minimize the population risk at rate O (1 / √ n ), and prove that, surprisingly, there exist problem instances where the SGD solution exhibits both empirical risk and…
3 Citations

## Figures from this paper

### Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks

• Computer Science
ArXiv
• 2022
This paper considers gradient descent and stochastic gradient descent to train SNNs, for both of which it develops consistent excess risk bounds by balancing the optimization and generalization via early-stopping by leveraging the concept of algorithmic stability.

### Max-Margin Works while Large Margin Fails: Generalization without Uniform Convergence

• Computer Science
ArXiv
• 2022
A new type of margin bound is proved showing that above a certain signal-to-noise threshold, any near-max-margin classiﬁer will achieve almost no test loss in these two settings, and provides insight on why memorization can coexist with generalization.

### Making Progress Based on False Discoveries

A generic reduction from the standard setting of statistical queries to the problem of estimating gradients queried by gradient descent is provided, in contrast with classical bounds that show that with O (1 /ε 2 ) samples one can optimize the population risk to accuracy of O ( ε ) but, as it turns out, with spurious gradients.

## References

SHOWING 1-10 OF 43 REFERENCES

### Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses

• Computer Science
NeurIPS
• 2020
This work provides sharp upper and lower bounds for several forms of SGD and full-batch GD on arbitrary Lipschitz nonsmooth convex losses and obtains the first dimension-independent generalization bounds for multi-pass SGD in the nonssooth case.

### SGD Generalizes Better Than GD (And Regularization Doesn't Help)

• Computer Science
COLT
• 2021
It is shown that with the same number of steps GD may overfit and emit a solution with Ω(1) generalization error, and how regularizing the empirical risk minimized by GD essentially does not change the above result.

### Random Reshuffling: Simple Analysis with Vast Improvements

• Computer Science
NeurIPS
• 2020
The theory for strongly-convex objectives tightly matches the known lower bounds for both RR and SO and substantiates the common practical heuristic of shuffling once or only a few times and proves fast convergence of the Shuffle-Once algorithm, which shuffles the data only once.

### SGD without Replacement: Sharper Rates for General Smooth Convex Functions

• Computer Science
ICML
• 2019

### Stability and Generalization

• Computer Science, Mathematics
J. Mach. Learn. Res.
• 2002
These notions of stability for learning algorithms are defined and it is shown how to use these notions to derive generalization error bounds based on the empirical error and the leave-one-out error.

### The Implicit Bias of Benign Overfitting

It is shown that for regression, benign overﬁtting is “biased” towards certain types of problems, in the sense that its existence on one learning problem precludes itsexistence on other learning problems.