# Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring

@inproceedings{Ahn2012BayesianPS, title={Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring}, author={Sungjin Ahn and Anoop Korattikara Balan and Max Welling}, booktitle={ICML}, year={2012} }

In this paper we address the following question: "Can we approximately sample from a Bayesian posterior distribution if we are only allowed to touch a small mini-batch of data-items for every sample we generate?". An algorithm based on the Langevin equation with stochastic gradients (SGLD) was previously proposed to solve this, but its mixing rate was slow. By leveraging the Bayesian Central Limit Theorem, we extend the SGLD algorithm so that at high mixing rates it will sample from a normal…

## 22 Citations

Mini-Batch Metropolis–Hastings With Reversible SGLD Proposal

- Computer ScienceJournal of the American Statistical Association
- 2020

This work proposes a general framework for performing MH-MCMC using mini-batches of the whole dataset and shows that this gives rise to approximately a tempered stationary distribution, and proves that the algorithm preserves the modes of the original target distribution.

Bayesian Conditional Density Filtering

- Computer ScienceJournal of Computational and Graphical Statistics
- 2018

ABSTRACT We propose a conditional density filtering (C-DF) algorithm for efficient online Bayesian inference. C-DF adapts MCMC sampling to the online setting, sampling from approximations to…

Scalable MCMC for Mixed Membership Stochastic Blockmodels

- Computer ScienceAISTATS
- 2016

The experimental results show that SG-MCMC strictly dominates competing algorithms in all cases and the algorithm is based on the stochastic gradient Riemannian Langevin sampler and achieves both faster speed and higher accuracy at every iteration than the current state-of-the-art algorithm based on Stochastic variational inference.

Coresets for Scalable Bayesian Logistic Regression

- Computer ScienceNIPS
- 2016

This paper develops an efficient coreset construction algorithm for Bayesian logistic regression models that provides theoretical guarantees on the size and approximation quality of the coreset -- both for fixed, known datasets, and in expectation for a wide class of data generative models.

Approximations of Markov Chains and High-Dimensional Bayesian Inference

- Computer Science
- 2015

A framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget is proposed.

Optimal approximating Markov chains for Bayesian inference

- Computer Science
- 2015

This work gives simple, sharp bounds for uniform approximations of uniformly mixing Markov chains, and suggests a notion of optimality that incorporates computation time and approximation error, and uses these bounds to make generalizations about properties of good approximation in the uniformly mixing setting.

Stochastic Gradient Markov Chain Monte Carlo

- Computer Science
- 2019

A particular class of scalable Monte Carlo algorithms, stochastic gradient Markov chain Monte Carlo (SGMCMC) which utilizes data subsampling techniques to reduce the per-iteration cost of MCMC is presented.

Approximations of Markov Chains and Bayesian Inference

- Computer Science, Mathematics
- 2015

A framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified discrepancy measure and computational budget is proposed.

Stochastic Gradient MCMC for State Space Models

- Computer ScienceSIAM J. Math. Data Sci.
- 2019

This work proposes stochastic gradient estimators that control this bias by performing additional computation in a `buffer' to reduce breaking dependencies and develops novel SGMCMC samplers for discrete, continuous and mixed-type SSMs with analytic message passing.

Asymptotic Simulated Annealing for Variational Inference

- Computer Science2018 IEEE Global Communications Conference (GLOBECOM)
- 2018

A novel optimization tool called asymptotically-annealed variational inference (AVI) is proposed, for better local optimal convergence of VI by using ideas from small-variance asymPTotics to efficiently search for better solutions.

## References

SHOWING 1-10 OF 14 REFERENCES

Bayesian Learning via Stochastic Gradient Langevin Dynamics

- Computer ScienceICML
- 2011

In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic…

A tutorial on adaptive MCMC

- Computer ScienceStat. Comput.
- 2008

This work proposes a series of novel adaptive algorithms which prove to be robust and reliable in practice and reviews criteria and the useful framework of stochastic approximation, which allows one to systematically optimise generally used criteria.

A Stochastic Quasi-Newton Method for Online Convex Optimization

- Computer ScienceAISTATS
- 2007

Stochastic variants of the wellknown BFGS quasi-Newton optimization method, in both full and memory-limited (LBFGS) forms, are developed for online optimization of convex functions, which asymptotically outperforms previous stochastic gradient methods for parameter estimation in conditional random fields.

Probabilistic Inference Using Markov Chain Monte Carlo Methods

- Computer Science
- 2011

The role of probabilistic inference in artificial intelligence is outlined, the theory of Markov chains is presented, and various Markov chain Monte Carlo algorithms are described, along with a number of supporting techniques.

Riemann manifold Langevin and Hamiltonian Monte Carlo methods

- Computer Science
- 2011

The methodology proposed automatically adapts to the local structure when simulating paths across this manifold, providing highly efficient convergence and exploration of the target density, and substantial improvements in the time‐normalized effective sample size are reported when compared with alternative sampling approaches.

Riemann Manifold Langevin and Hamiltonian Monte Carlo

- Computer Science
- 2010

This paper proposes Metropolis adjusted Langevin and Hamiltonian Monte Carlo sampling methods defined on the Riemann manifold to resolve the shortcomings of existing Monte Carlo algorithms when…

Maximum likelihood estimation using the empirical fisher information matrix

- Computer Science
- 2002

The introduction of an additional stochastic component into response models and incomplete data problems is shown to greatly increase the range of situations in which the estimator can be employed.

Low Rank Updates for the Cholesky Decomposition

- Computer Science
- 2004

This note shows how the Cholesky decomposition can be updated to incorporate low rank additions or downdated for low rank subtractions and discusses a special case of an indefinite update of rank two.

Classification using discriminative restricted Boltzmann machines

- Computer ScienceICML '08
- 2008

This paper presents an evaluation of different learning algorithms for RBMs which aim at introducing a discriminative component to RBM training and improve their performance as classifiers, and demonstrates how discriminating RBMs can also be successfully employed in a semi-supervised setting.

The Tradeoffs of Large Scale Learning

- Computer ScienceNIPS
- 2007

This contribution develops a theoretical framework that takes into account the effect of approximate optimization on learning algorithms. The analysis shows distinct tradeoffs for the case of…