SAME but Different: Fast and High Quality Gibbs Parameter Estimation

@article{Zhao2015SAMEBD,
  title={SAME but Different: Fast and High Quality Gibbs Parameter Estimation},
  author={Huasha Zhao and Biye Jiang and John F. Canny},
  journal={Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
  year={2015}
}
  • Huasha ZhaoBiye JiangJ. Canny
  • Published 18 September 2014
  • Computer Science
  • Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Gibbs sampling is a workhorse for Bayesian inference but has several limitations when used for parameter estimation, and is often much slower than non-sampling inference methods. SAME (State Augmentation for Marginal Estimation) [15, 8] is an approach to MAP parameter estimation which gives improved parameter estimates over direct Gibbs sampling. SAME can be viewed as cooling the posterior parameter distribution and allows annealed search for the MAP parameters, often yielding very high quality… 

Figures and Tables from this paper

Fast Parallel SAME Gibbs Sampling on General Discrete Bayesian Networks

An optimized, parallel Gibbs sampler augmented with state replication (SAME or State Augmented Marginal Estimation) to decrease convergence time and find that SAME can improve the quality of parameter estimates while accelerating convergence.

Heron Inference for Bayesian Graphical Models

An existing Gibbs sampling method is extended, and a new deterministic Heron inference (Heron) is proposed for a family of Bayesian graphical models, able to easily assess the convergence status and largely improve the running efficiency.

Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling

A novel inference method is proposed for the frequentist estimation of parameters, that adapts MCMC methods to online inference of latent variable models with the proper use of local Gibbs sampling, and is superior to variational inference in terms of test log-likelihoods.

WarpLDA: a Cache Efficient O(1) Algorithm for Latent Dirichlet Allocation

WarpLDA is developed, an LDA sampler which achieves both the best O(1) time complexity per token and thebest O(K) scope of random access, within a short period of time.

A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

The experiment showed that the proposed inference gave a better predictive performance in terms of test set perplexity than the inference using the Dirichlet distribution for posterior approximation, and was better even than the collapsed Gibbs sampling.

WarpLDA: a Simple and Efficient O(1) Algorithm for Latent Dirichlet Allocation

W warpLDA is a Metropolis-Hastings based algorithm which is designed to optimize the cache hit rate, and is 5-15x faster than state-of-the-art LDA samplers, implying less cost of time and money.

Scaling up Dynamic Topic Models

This paper presents a fast and parallelizable inference algorithm using Gibbs Sampling with Stochastic Gradient Langevin Dynamics that does not make any unwarranted assumptions and is able to learn the largest Dynamic Topic Model to the authors' knowledge.

On extended state-space constructions for Monte Carlo methods

A generic importance-sampling framework is described which admits virtually all Monte Carlo methods, including smc and mcmc methods, as special cases and hierarchical combinations of different Monte Carlo schemes can be justified as repeated applications of this framework.

Scalable Training of Hierarchical Topic Models

This paper proposes an efficient partially collapsed Gibbs sampling algorithm for hLDA, as well as an initialization strategy to deal with local optima introduced by tree-structured models, and identifies new system challenges in building scalable systems for HTMs.

Sampled Dense Matrix Multiplication for High-Performance Machine Learning

The development of cuSDDMM, a multi-node GPU-accelerated implementation for Sampled Dense-Dense Matrix Multiplication improves significantly over the best currently available GPU implementation of SDDMM (in the BIDMach Machine Learning library).

References

SHOWING 1-10 OF 20 REFERENCES

Marginal MAP estimation using Markov chain Monte Carlo

  • C. RobertA. DoucetS. Godsill
  • Computer Science
    1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258)
  • 1999
A simple and novel MCMC strategy called state-augmentation for marginal estimation (SAME) is presented, that allows MMAP estimates to be obtained for Bayesian models.

Variational Message Passing

Variational Message Passing is introduced, a general purpose algorithm for applying variational inference to Bayesian Networks and can be applied to very general class of conjugate-exponential models because it uses a factorised variational approximation.

Marginal maximum a posteriori estimation using Markov chain Monte Carlo

A simple and novel MCMC strategy, called State-Augmentation for Marginal Estimation (SAME), which leads to MMAP estimates for Bayesian models and illustrates the simplicity and utility of the approach for missing data interpolation in autoregressive time series and blind deconvolution of impulsive processes.

An Introduction to Variational Methods for Graphical Models

This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov random fields), and describes a general framework for generating variational transformations based on convex duality.

A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms

Two modifications to the MCEM algorithm (the poor man's data augmentation algorithms), which allow for the calculation of the entire posterior, are presented and serve as diagnostics for the validity of the posterior distribution.

Expectation Propagation for approximate Bayesian inference

Expectation Propagation approximates the belief states by only retaining expectations, such as mean and varitmce, and iterates until these expectations are consistent throughout the network, which makes it applicable to hybrid networks with discrete and continuous nodes.

Correctness of Local Probability Propagation in Graphical Models with Loops

An analytical relationship is derived between the probabilities computed using local propagation and the correct marginals and a category of graphical models with loops for which local propagation gives rise to provably optimal maximum a posteriori assignments (although the computed marginals will be incorrect).

Scalable inference in latent variable models

A scalable parallel framework for efficient inference in latent variable models over streaming web-scale data by introducing a novel delta-based aggregation system with a bandwidth-efficient communication protocol, schedule-aware out-of-core storage, and approximate forward sampling to rapidly incorporate new data.

Online Learning for Latent Dirichlet Allocation

An online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA) based on online stochastic optimization with a natural gradient step is developed, which shows converges to a local optimum of the VB objective function.

Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images

  • S. GemanD. Geman
  • Physics
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 1984
The analogy between images and statistical mechanics systems is made and the analogous operation under the posterior distribution yields the maximum a posteriori (MAP) estimate of the image given the degraded observations, creating a highly parallel ``relaxation'' algorithm for MAP estimation.