Stochastic Gradient Annealed Importance Sampling for Efficient Online Marginal Likelihood Estimation †

@article{Cameron2019StochasticGA,
  title={Stochastic Gradient Annealed Importance Sampling for Efficient Online Marginal Likelihood Estimation †},
  author={Scott A. Cameron and Hans C. Eggers and Steve Kroon},
  journal={Entropy},
  year={2019},
  volume={21}
}
We consider estimating the marginal likelihood in settings with independent and identically distributed (i.i.d.) data. We propose estimating the predictive distributions in a sequential factorization of the marginal likelihood in such settings by using stochastic gradient Markov Chain Monte Carlo techniques. This approach is far more efficient than traditional marginal likelihood estimation techniques such as nested sampling and annealed importance sampling due to its use of mini-batches to… 
1 Citations

Figures from this paper

Variational inference as an alternative to MCMC for parameter estimation and model selection

Variational inference is found to be much faster than MCMC and nested sampling techniques for most of these problems while providing competitive results, and derives a new approximate evidence estimation based on variational posterior, and importance sampling technique called posterior-weighted importance sampling for the calculation of evidence.

References

SHOWING 1-10 OF 22 REFERENCES

A Sequential Marginal Likelihood Approximation Using Stochastic Gradients

This work estimates the marginal likelihood via a sequential decomposition into a product of predictive distributions p ( y n | y < n ) .

Sandwiching the marginal likelihood using bidirectional Monte Carlo

Bidirectional Monte Carlo is presented, a technique for obtaining accurate log-ML estimates on data simulated from a model using annealed importance sampling or sequential Monte Carlo, and obtains stochastic upper bounds by running these same algorithms in reverse starting from an exact posterior sample.

Annealed importance sampling

It is shown how one can use the Markov chain transitions for such an annealing sequence to define an importance sampler, which can be seen as a generalization of a recently-proposed variant of sequential importance sampling.

Elements of Sequential Monte Carlo

This tutorial reviews sequential Monte Carlo, a random-sampling-based class of methods for approximate inference, and discusses the SMC estimate of the normalizing constant, how this can be used for pseudo-marginal inference and inference evaluation.

Nested sampling for general Bayesian computation

Nested sampling estimates directly how the likelihood function relates to prior mass. The evidence (alternatively the marginal likelihood, marginal den- sity of the data, or the prior predictive) is

A Complete Recipe for Stochastic Gradient MCMC

This paper provides a general recipe for constructing MCMC samplers--including stochastic gradient versions--based on continuous Markov processes specified via two matrices, and uses the recipe to straightforwardly propose a new state-adaptive sampler: stochastics gradient Riemann Hamiltonian Monte Carlo (SGRHMC).

Bayesian Learning via Stochastic Gradient Langevin Dynamics

In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic

Stochastic Gradient Hamiltonian Monte Carlo

A variant that uses second-order Langevin dynamics with a friction term that counteracts the effects of the noisy gradient, maintaining the desired target distribution as the invariant distribution is introduced.

Sequential Imputations and Bayesian Missing Data Problems

This article introduces an alternative procedure that involves imputing the missing data sequentially and computing appropriate importance sampling weights, and in many applications this new procedure works very well without the need for iterations.

Bayesian Optimization with Robust Bayesian Neural Networks

This work presents a general approach for using flexible parametric models (neural networks) for Bayesian optimization, staying as close to a truly Bayesian treatment as possible and obtaining scalability through stochastic gradient Hamiltonian Monte Carlo, whose robustness is improved via a scale adaptation.