On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length
@article{Jastrzebski2019OnTR, title={On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length}, author={Stanislaw Jastrzebski and Zachary Kenton and Nicolas Ballas and Asja Fischer and Yoshua Bengio and A. Storkey}, journal={arXiv: Machine Learning}, year={2019} }
Stochastic Gradient Descent (SGD) based training of neural networks with a large learning rate or a small batch-size typically ends in well-generalizing, flat regions of the weight space, as indicated by small eigenvalues of the Hessian of the training loss. [...] Key Result In summary, our analysis of the dynamics of SGD in the subspace of the sharpest directions shows that they influence the regions that SGD steers to (where larger learning rate or smaller batch size result in wider regions visited), the…Expand Abstract
Supplemental Code
Github Repo
Via Papers with Code
Code for "On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length", ICLR 2019
Figures, Tables, and Topics from this paper
39 Citations
The Break-Even Point on Optimization Trajectories of Deep Neural Networks
- Computer Science, Mathematics
- ICLR
- 2020
- 19
- Highly Influenced
- PDF
Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
- Computer Science, Mathematics
- SDM
- 2020
- 10
- PDF
THE BREAK-EVEN POINT ON THE OPTIMIZATION TRA- JECTORIES OF DEEP NEURAL NETWORKS
- 2019
- Highly Influenced
- PDF
Curvature is Key: Sub-Sampled Loss Surfaces and the Implications for Large Batch Training
- Mathematics, Computer Science
- ArXiv
- 2020
- 3
Gradient Noise Convolution (GNC): Smoothing Loss Function for Distributed Large-Batch SGD
- Mathematics, Computer Science
- ArXiv
- 2019
- 4
- PDF
S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise Injection for Reaching Flat Minima
- Computer Science, Mathematics
- ArXiv
- 2020
- 1
- PDF
Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization
- Computer Science, Mathematics
- ArXiv
- 2020
- Highly Influenced
- PDF
Experimental exploration on loss surface of deep neural network
- Computer Science
- Int. J. Imaging Syst. Technol.
- 2020
Towards understanding the true loss surface of deep neural networks using random matrix theory and iterative spectral methods
- Mathematics
- 2019
- 8
- Highly Influenced
Layer rotation: a surprisingly simple indicator of generalization in deep networks?
- Computer Science
- 2019
- 1
References
SHOWING 1-10 OF 34 REFERENCES
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
- Computer Science, Mathematics
- ICLR
- 2017
- 1,112
- Highly Influential
- PDF
SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning
- Mathematics, Computer Science
- 2018
- 15
- PDF
The Regularization Effects of Anisotropic Noise in Stochastic Gradient Descent
- Physics, Computer Science
- ArXiv
- 2018
- 21
Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks
- Mathematics, Computer Science
- 2018 Information Theory and Applications Workshop (ITA)
- 2018
- 148
- PDF
Empirical Analysis of the Hessian of Over-Parametrized Neural Networks
- Computer Science, Mathematics
- ICLR
- 2018
- 167
- PDF
High-dimensional dynamics of generalization error in neural networks
- Computer Science, Mathematics
- Neural Networks
- 2020
- 173
- PDF