Stochastic natural gradient descent draws posterior samples in function space
@article{Smith2018StochasticNG, title={Stochastic natural gradient descent draws posterior samples in function space}, author={S. L. Smith and Daniel Duckworth and Quoc V. Le and Jascha Sohl-Dickstein}, journal={ArXiv}, year={2018}, volume={abs/1806.09597} }
Recent work has argued that stochastic gradient descent can approximate the Bayesian uncertainty in model parameters near local minima. In this work we develop a similar correspondence for minibatch natural gradient descent (NGD). We prove that for sufficiently small learning rates, if the model predictions on the training set approach the true conditional distribution of labels given inputs, the stationary distribution of minibatch NGD approaches a Bayesian posterior near local minima. The… CONTINUE READING
4 Citations
A unified theory of adaptive stochastic gradient descent as Bayesian filtering
- Mathematics, Computer Science
- ArXiv
- 2018
- 4
The large learning rate phase of deep learning: the catapult mechanism
- Computer Science, Mathematics
- ArXiv
- 2020
- 28
- PDF
Fast Convergence of Langevin Dynamics on Manifold: Geodesics meet Log-Sobolev
- Computer Science, Mathematics
- NeurIPS
- 2020
- 1
- PDF
References
SHOWING 1-10 OF 32 REFERENCES
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
- Mathematics, Computer Science
- ICLR
- 2018
- 181
- PDF
Stochastic Gradient Descent as Approximate Bayesian Inference
- Computer Science, Mathematics
- J. Mach. Learn. Res.
- 2017
- 282
- PDF
Bayesian Learning via Stochastic Gradient Langevin Dynamics
- Mathematics, Computer Science
- ICML
- 2011
- 1,163
- PDF
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
- Computer Science, Mathematics
- ICLR
- 2017
- 1,110
- PDF
The Marginal Value of Adaptive Gradient Methods in Machine Learning
- Computer Science, Mathematics
- NIPS
- 2017
- 520
- PDF
Coupling Adaptive Batch Sizes with Learning Rates
- Computer Science, Mathematics
- UAI
- 2017
- 64
- Highly Influential
- PDF