• Corpus ID: 216077795

Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring

@inproceedings{Ahn2012BayesianPS,
title={Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring},
author={Sungjin Ahn and Anoop Korattikara Balan and Max Welling},
booktitle={ICML},
year={2012}
}
• Published in ICML 26 June 2012
• Computer Science, Mathematics
In this paper we address the following question: "Can we approximately sample from a Bayesian posterior distribution if we are only allowed to touch a small mini-batch of data-items for every sample we generate?". An algorithm based on the Langevin equation with stochastic gradients (SGLD) was previously proposed to solve this, but its mixing rate was slow. By leveraging the Bayesian Central Limit Theorem, we extend the SGLD algorithm so that at high mixing rates it will sample from a normal…

Figures and Tables from this paper

Mini-Batch Metropolis–Hastings With Reversible SGLD Proposal
• Computer Science
Journal of the American Statistical Association
• 2020
This work proposes a general framework for performing MH-MCMC using mini-batches of the whole dataset and shows that this gives rise to approximately a tempered stationary distribution, and proves that the algorithm preserves the modes of the original target distribution.
Bayesian Conditional Density Filtering
• Computer Science
Journal of Computational and Graphical Statistics
• 2018
ABSTRACT We propose a conditional density filtering (C-DF) algorithm for efficient online Bayesian inference. C-DF adapts MCMC sampling to the online setting, sampling from approximations to
Scalable MCMC for Mixed Membership Stochastic Blockmodels
• Computer Science
AISTATS
• 2016
The experimental results show that SG-MCMC strictly dominates competing algorithms in all cases and the algorithm is based on the stochastic gradient Riemannian Langevin sampler and achieves both faster speed and higher accuracy at every iteration than the current state-of-the-art algorithm based on Stochastic variational inference.
Coresets for Scalable Bayesian Logistic Regression
• Computer Science
NIPS
• 2016
This paper develops an efficient coreset construction algorithm for Bayesian logistic regression models that provides theoretical guarantees on the size and approximation quality of the coreset -- both for fixed, known datasets, and in expectation for a wide class of data generative models.
An Algorithm for Distributed Bayesian Inference in Generalized Linear Models
• Computer Science
• 2019
This work develops a scalable extension of Monte Carlo algorithms using the divide-and-conquer technique that divides the data into a sufficiently large number of subsets, draws parameters in parallel on the subsets using a \textit{powered} likelihood, and produces Monte Carlo draws of the parameter by combining parameter draws obtained from each subset.
Approximations of Markov Chains and High-Dimensional Bayesian Inference
• Computer Science
• 2015
A framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget is proposed.
Optimal approximating Markov chains for Bayesian inference
• Computer Science
• 2015
This work gives simple, sharp bounds for uniform approximations of uniformly mixing Markov chains, and suggests a notion of optimality that incorporates computation time and approximation error, and uses these bounds to make generalizations about properties of good approximation in the uniformly mixing setting.
Stochastic Gradient Markov Chain Monte Carlo
• Computer Science
• 2019
A particular class of scalable Monte Carlo algorithms, stochastic gradient Markov chain Monte Carlo (SGMCMC) which utilizes data subsampling techniques to reduce the per-iteration cost of MCMC is presented.
Approximations of Markov Chains and Bayesian Inference
• Computer Science, Mathematics
• 2015
A framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified discrepancy measure and computational budget is proposed.

References

SHOWING 1-10 OF 14 REFERENCES
Bayesian Learning via Stochastic Gradient Langevin Dynamics
• Computer Science
ICML
• 2011
In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic
• Computer Science
Stat. Comput.
• 2008
This work proposes a series of novel adaptive algorithms which prove to be robust and reliable in practice and reviews criteria and the useful framework of stochastic approximation, which allows one to systematically optimise generally used criteria.
A Stochastic Quasi-Newton Method for Online Convex Optimization
• Computer Science
AISTATS
• 2007
Stochastic variants of the wellknown BFGS quasi-Newton optimization method, in both full and memory-limited (LBFGS) forms, are developed for online optimization of convex functions, which asymptotically outperforms previous stochastic gradient methods for parameter estimation in conditional random fields.
Probabilistic Inference Using Markov Chain Monte Carlo Methods
The role of probabilistic inference in artificial intelligence is outlined, the theory of Markov chains is presented, and various Markov chain Monte Carlo algorithms are described, along with a number of supporting techniques.
Riemann manifold Langevin and Hamiltonian Monte Carlo methods
• Computer Science
• 2011
The methodology proposed automatically adapts to the local structure when simulating paths across this manifold, providing highly efficient convergence and exploration of the target density, and substantial improvements in the time‐normalized effective sample size are reported when compared with alternative sampling approaches.
Riemann Manifold Langevin and Hamiltonian Monte Carlo
This paper proposes Metropolis adjusted Langevin and Hamiltonian Monte Carlo sampling methods defined on the Riemann manifold to resolve the shortcomings of existing Monte Carlo algorithms when
Maximum likelihood estimation using the empirical fisher information matrix
The introduction of an additional stochastic component into response models and incomplete data problems is shown to greatly increase the range of situations in which the estimator can be employed.
Low Rank Updates for the Cholesky Decomposition
This note shows how the Cholesky decomposition can be updated to incorporate low rank additions or downdated for low rank subtractions and discusses a special case of an indefinite update of rank two.
Classification using discriminative restricted Boltzmann machines
• Computer Science
ICML '08
• 2008
This paper presents an evaluation of different learning algorithms for RBMs which aim at introducing a discriminative component to RBM training and improve their performance as classifiers, and demonstrates how discriminating RBMs can also be successfully employed in a semi-supervised setting.
The Tradeoffs of Large Scale Learning
• Computer Science
NIPS
• 2007
This contribution develops a theoretical framework that takes into account the effect of approximate optimization on learning algorithms. The analysis shows distinct tradeoffs for the case of