• Corpus ID: 221819349

Stochastic Gradient Langevin Dynamics Algorithms with Adaptive Drifts

@article{Kim2020StochasticGL,
  title={Stochastic Gradient Langevin Dynamics Algorithms with Adaptive Drifts},
  author={Sehwan Kim and Qifan Song and Faming Liang},
  journal={ArXiv},
  year={2020},
  volume={abs/2009.09535}
}
Bayesian deep learning offers a principled way to address many issues concerning safety of artificial intelligence (AI), such as model uncertainty,model interpretability, and prediction bias. However, due to the lack of efficient Monte Carlo algorithms for sampling from the posterior of deep neural networks (DNNs), Bayesian deep learning has not yet powered our AI system. We propose a class of adaptive stochastic gradient Markov chain Monte Carlo (SGMCMC) algorithms, where the drift function is… 

Figures and Tables from this paper

Differentially private training of neural networks with Langevin dynamics forcalibrated predictive uncertainty

This work highlights and exploits parallels between stochastic gradient Langevin dynamics, a scalable Bayesian inference technique for training deep neural networks, and DP-SGD, in order to train differentially private, Bayesian neural networks with minor adjustments to the original (DP- SGD) algorithm.

Efficient and Generalizable Tuning Strategies for Stochastic Gradient MCMC

A novel bandit-based algorithm is proposed that tunes the SGMCMC hyperparameters by minimizing the Stein discrepancy between the true posterior and its Monte Carlo approximation.

Sample-dependent Adaptive Temperature Scaling for Improved Calibration

For each input, this work proposes to predict a different temperature value, allowing us to adjust the mismatch between confidence and accuracy at a finer granularity, and is applied post-hoc to off-the-shelf pre-trained classifiers.

References

SHOWING 1-10 OF 46 REFERENCES

Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

This work proposes combining adaptive preconditioners with Stochastic Gradient Langevin Dynamics, and gives theoretical properties on asymptotic convergence and predictive risk, and empirical results for Logistic Regression, Feedforward Neural Nets, and Convolutional Neural Nets demonstrate that the preconditionsed SGLD method gives state-of-the-art performance.

Bayesian Learning via Stochastic Gradient Langevin Dynamics

In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic

The True Cost of SGLD

The mean squared error of Lipschitz functionals in strongly logconcave models with i.i.d. data of growing data set size is studied and it is shown that, given a batchsize, to control the bias of SGLD the stepsize has to be chosen so small that the computational cost of reaching a target accuracy is roughly the same for all batchsizes.

Approximation Analysis of Stochastic Gradient Langevin Dynamics by using Fokker-Planck Equation and Ito Process

This work theoretically analyze the SGLD algorithm with constant stepsize and shows by using the Fokker-Planck equation that the probability distribution of random variables generated by the S GLD algorithm converges to the Bayesian posterior.

Stochastic Gradient Hamiltonian Monte Carlo

A variant that uses second-order Langevin dynamics with a friction term that counteracts the effects of the noisy gradient, maintaining the desired target distribution as the invariant distribution is introduced.

Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis

The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks.

Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics

This article proves that, under verifiable assumptions, the SGLD algorithm is consistent, satisfies a central limit theorem (CLT) and its asymptotic bias-variance decomposition can be characterized by an explicit functional of the step-sizes sequence (δm)m≥0.

Exploration of the (Non-)Asymptotic Bias and Variance of Stochastic Gradient Langevin Dynamics

A modified SGLD which removes the asymptotic bias due to the variance of the stochastic gradients up to first order in the step size is derived and bounds on the finite-time bias, variance and mean squared error are obtained.

Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning

This work develops Cyclical Stochastic Gradient MCMC (SG-MCMC), a cyclical stepsize schedule, where larger steps discover new modes, and smaller steps characterize each mode, and proves non-asymptotic convergence of the proposed algorithm.

An Adaptive Empirical Bayesian Method for Sparse Deep Learning

A novel adaptive empirical Bayesian method for sparse deep learning, where the sparsity is ensured via a class of self-adaptive spike-and-slab priors, which leads to the state-of-the-art performance on MNIST and Fashion MNIST with shallow convolutional neural networks and the state of theart compression performance on CIFAR10 with Residual Networks.