• Corpus ID: 216077795

Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring

@inproceedings{Ahn2012BayesianPS,
  title={Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring},
  author={Sungjin Ahn and Anoop Korattikara Balan and Max Welling},
  booktitle={ICML},
  year={2012}
}
In this paper we address the following question: "Can we approximately sample from a Bayesian posterior distribution if we are only allowed to touch a small mini-batch of data-items for every sample we generate?". An algorithm based on the Langevin equation with stochastic gradients (SGLD) was previously proposed to solve this, but its mixing rate was slow. By leveraging the Bayesian Central Limit Theorem, we extend the SGLD algorithm so that at high mixing rates it will sample from a normal… 

Figures and Tables from this paper

Mini-Batch Metropolis–Hastings With Reversible SGLD Proposal
TLDR
This work proposes a general framework for performing MH-MCMC using mini-batches of the whole dataset and shows that this gives rise to approximately a tempered stationary distribution, and proves that the algorithm preserves the modes of the original target distribution.
Bayesian Conditional Density Filtering
ABSTRACT We propose a conditional density filtering (C-DF) algorithm for efficient online Bayesian inference. C-DF adapts MCMC sampling to the online setting, sampling from approximations to
Scalable MCMC for Mixed Membership Stochastic Blockmodels
TLDR
The experimental results show that SG-MCMC strictly dominates competing algorithms in all cases and the algorithm is based on the stochastic gradient Riemannian Langevin sampler and achieves both faster speed and higher accuracy at every iteration than the current state-of-the-art algorithm based on Stochastic variational inference.
Coresets for Scalable Bayesian Logistic Regression
TLDR
This paper develops an efficient coreset construction algorithm for Bayesian logistic regression models that provides theoretical guarantees on the size and approximation quality of the coreset -- both for fixed, known datasets, and in expectation for a wide class of data generative models.
Approximations of Markov Chains and High-Dimensional Bayesian Inference
TLDR
A framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget is proposed.
Optimal approximating Markov chains for Bayesian inference
TLDR
This work gives simple, sharp bounds for uniform approximations of uniformly mixing Markov chains, and suggests a notion of optimality that incorporates computation time and approximation error, and uses these bounds to make generalizations about properties of good approximation in the uniformly mixing setting.
Stochastic Gradient Markov Chain Monte Carlo
TLDR
A particular class of scalable Monte Carlo algorithms, stochastic gradient Markov chain Monte Carlo (SGMCMC) which utilizes data subsampling techniques to reduce the per-iteration cost of MCMC is presented.
Approximations of Markov Chains and Bayesian Inference
TLDR
A framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified discrepancy measure and computational budget is proposed.
Stochastic Gradient MCMC for State Space Models
TLDR
This work proposes stochastic gradient estimators that control this bias by performing additional computation in a `buffer' to reduce breaking dependencies and develops novel SGMCMC samplers for discrete, continuous and mixed-type SSMs with analytic message passing.
Asymptotic Simulated Annealing for Variational Inference
TLDR
A novel optimization tool called asymptotically-annealed variational inference (AVI) is proposed, for better local optimal convergence of VI by using ideas from small-variance asymPTotics to efficiently search for better solutions.
...
1
2
3
...

References

SHOWING 1-10 OF 14 REFERENCES
Bayesian Learning via Stochastic Gradient Langevin Dynamics
In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic
A tutorial on adaptive MCMC
TLDR
This work proposes a series of novel adaptive algorithms which prove to be robust and reliable in practice and reviews criteria and the useful framework of stochastic approximation, which allows one to systematically optimise generally used criteria.
A Stochastic Quasi-Newton Method for Online Convex Optimization
TLDR
Stochastic variants of the wellknown BFGS quasi-Newton optimization method, in both full and memory-limited (LBFGS) forms, are developed for online optimization of convex functions, which asymptotically outperforms previous stochastic gradient methods for parameter estimation in conditional random fields.
Probabilistic Inference Using Markov Chain Monte Carlo Methods
TLDR
The role of probabilistic inference in artificial intelligence is outlined, the theory of Markov chains is presented, and various Markov chain Monte Carlo algorithms are described, along with a number of supporting techniques.
Riemann manifold Langevin and Hamiltonian Monte Carlo methods
TLDR
The methodology proposed automatically adapts to the local structure when simulating paths across this manifold, providing highly efficient convergence and exploration of the target density, and substantial improvements in the time‐normalized effective sample size are reported when compared with alternative sampling approaches.
Riemann Manifold Langevin and Hamiltonian Monte Carlo
This paper proposes Metropolis adjusted Langevin and Hamiltonian Monte Carlo sampling methods defined on the Riemann manifold to resolve the shortcomings of existing Monte Carlo algorithms when
Maximum likelihood estimation using the empirical fisher information matrix
TLDR
The introduction of an additional stochastic component into response models and incomplete data problems is shown to greatly increase the range of situations in which the estimator can be employed.
Low Rank Updates for the Cholesky Decomposition
TLDR
This note shows how the Cholesky decomposition can be updated to incorporate low rank additions or downdated for low rank subtractions and discusses a special case of an indefinite update of rank two.
Classification using discriminative restricted Boltzmann machines
TLDR
This paper presents an evaluation of different learning algorithms for RBMs which aim at introducing a discriminative component to RBM training and improve their performance as classifiers, and demonstrates how discriminating RBMs can also be successfully employed in a semi-supervised setting.
The Tradeoffs of Large Scale Learning
This contribution develops a theoretical framework that takes into account the effect of approximate optimization on learning algorithms. The analysis shows distinct tradeoffs for the case of
...
1
2
...