Corpus ID: 49412311

Stochastic natural gradient descent draws posterior samples in function space

  title={Stochastic natural gradient descent draws posterior samples in function space},
  author={S. L. Smith and Daniel Duckworth and Quoc V. Le and Jascha Sohl-Dickstein},
  • S. L. Smith, Daniel Duckworth, +1 author Jascha Sohl-Dickstein
  • Published 2018
  • Computer Science, Mathematics
  • ArXiv
  • Recent work has argued that stochastic gradient descent can approximate the Bayesian uncertainty in model parameters near local minima. In this work we develop a similar correspondence for minibatch natural gradient descent (NGD). We prove that for sufficiently small learning rates, if the model predictions on the training set approach the true conditional distribution of labels given inputs, the stationary distribution of minibatch NGD approaches a Bayesian posterior near local minima. The… CONTINUE READING
    4 Citations
    A unified theory of adaptive stochastic gradient descent as Bayesian filtering
    • 4
    The large learning rate phase of deep learning: the catapult mechanism
    • 28
    • PDF
    Deep Learning is Singular, and That's Good
    • PDF
    Fast Convergence of Langevin Dynamics on Manifold: Geodesics meet Log-Sobolev
    • 1
    • PDF


    A Bayesian Perspective on Generalization and Stochastic Gradient Descent
    • 181
    • PDF
    Stochastic Gradient Descent as Approximate Bayesian Inference
    • 282
    • PDF
    Noisy Natural Gradient as Variational Inference
    • 84
    • PDF
    Natural Langevin Dynamics for Neural Networks
    • 20
    • PDF
    Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring
    • 134
    • PDF
    Bayesian Learning via Stochastic Gradient Langevin Dynamics
    • 1,163
    • PDF
    Don't Decay the Learning Rate, Increase the Batch Size
    • 437
    • PDF
    Coupling Adaptive Batch Sizes with Learning Rates
    • 64
    • Highly Influential
    • PDF