# SAME but Different: Fast and High Quality Gibbs Parameter Estimation

@article{Zhao2015SAMEBD, title={SAME but Different: Fast and High Quality Gibbs Parameter Estimation}, author={Huasha Zhao and Biye Jiang and John F. Canny}, journal={Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining}, year={2015} }

Gibbs sampling is a workhorse for Bayesian inference but has several limitations when used for parameter estimation, and is often much slower than non-sampling inference methods. SAME (State Augmentation for Marginal Estimation) [15, 8] is an approach to MAP parameter estimation which gives improved parameter estimates over direct Gibbs sampling. SAME can be viewed as cooling the posterior parameter distribution and allows annealed search for the MAP parameters, often yielding very high quality…

## 32 Citations

### Fast Parallel SAME Gibbs Sampling on General Discrete Bayesian Networks

- Computer ScienceArXiv
- 2015

An optimized, parallel Gibbs sampler augmented with state replication (SAME or State Augmented Marginal Estimation) to decrease convergence time and find that SAME can improve the quality of parameter estimates while accelerating convergence.

### Heron Inference for Bayesian Graphical Models

- Computer ScienceArXiv
- 2018

An existing Gibbs sampling method is extended, and a new deterministic Heron inference (Heron) is proposed for a family of Bayesian graphical models, able to easily assess the convergence status and largely improve the running efficiency.

### Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling

- Computer ScienceJ. Mach. Learn. Res.
- 2017

A novel inference method is proposed for the frequentist estimation of parameters, that adapts MCMC methods to online inference of latent variable models with the proper use of local Gibbs sampling, and is superior to variational inference in terms of test log-likelihoods.

### WarpLDA: a Cache Efficient O(1) Algorithm for Latent Dirichlet Allocation

- Computer ScienceProc. VLDB Endow.
- 2016

WarpLDA is developed, an LDA sampler which achieves both the best O(1) time complexity per token and thebest O(K) scope of random access, within a short period of time.

### A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

- Computer ScienceICCSA
- 2016

The experiment showed that the proposed inference gave a better predictive performance in terms of test set perplexity than the inference using the Dirichlet distribution for posterior approximation, and was better even than the collapsed Gibbs sampling.

### WarpLDA: a Simple and Efficient O(1) Algorithm for Latent Dirichlet Allocation

- Computer ScienceArXiv
- 2015

W warpLDA is a Metropolis-Hastings based algorithm which is designed to optimize the cache hit rate, and is 5-15x faster than state-of-the-art LDA samplers, implying less cost of time and money.

### Scaling up Dynamic Topic Models

- Computer ScienceWWW
- 2016

This paper presents a fast and parallelizable inference algorithm using Gibbs Sampling with Stochastic Gradient Langevin Dynamics that does not make any unwarranted assumptions and is able to learn the largest Dynamic Topic Model to the authors' knowledge.

### On extended state-space constructions for Monte Carlo methods

- Computer Science
- 2015

A generic importance-sampling framework is described which admits virtually all Monte Carlo methods, including smc and mcmc methods, as special cases and hierarchical combinations of different Monte Carlo schemes can be justified as repeated applications of this framework.

### Scalable Training of Hierarchical Topic Models

- Computer ScienceProc. VLDB Endow.
- 2018

This paper proposes an efficient partially collapsed Gibbs sampling algorithm for hLDA, as well as an initialization strategy to deal with local optima introduced by tree-structured models, and identifies new system challenges in building scalable systems for HTMs.

### Sampled Dense Matrix Multiplication for High-Performance Machine Learning

- Computer Science2018 IEEE 25th International Conference on High Performance Computing (HiPC)
- 2018

The development of cuSDDMM, a multi-node GPU-accelerated implementation for Sampled Dense-Dense Matrix Multiplication improves significantly over the best currently available GPU implementation of SDDMM (in the BIDMach Machine Learning library).

## References

SHOWING 1-10 OF 20 REFERENCES

### Marginal MAP estimation using Markov chain Monte Carlo

- Computer Science1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258)
- 1999

A simple and novel MCMC strategy called state-augmentation for marginal estimation (SAME) is presented, that allows MMAP estimates to be obtained for Bayesian models.

### Variational Message Passing

- Computer ScienceJ. Mach. Learn. Res.
- 2005

Variational Message Passing is introduced, a general purpose algorithm for applying variational inference to Bayesian Networks and can be applied to very general class of conjugate-exponential models because it uses a factorised variational approximation.

### Marginal maximum a posteriori estimation using Markov chain Monte Carlo

- Computer ScienceStat. Comput.
- 2002

A simple and novel MCMC strategy, called State-Augmentation for Marginal Estimation (SAME), which leads to MMAP estimates for Bayesian models and illustrates the simplicity and utility of the approach for missing data interpolation in autoregressive time series and blind deconvolution of impulsive processes.

### An Introduction to Variational Methods for Graphical Models

- Computer ScienceMachine Learning
- 2004

This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov random fields), and describes a general framework for generating variational transformations based on convex duality.

### A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms

- Computer Science
- 1990

Two modifications to the MCEM algorithm (the poor man's data augmentation algorithms), which allow for the calculation of the entire posterior, are presented and serve as diagnostics for the validity of the posterior distribution.

### Expectation Propagation for approximate Bayesian inference

- Computer ScienceUAI
- 2001

Expectation Propagation approximates the belief states by only retaining expectations, such as mean and varitmce, and iterates until these expectations are consistent throughout the network, which makes it applicable to hybrid networks with discrete and continuous nodes.

### Correctness of Local Probability Propagation in Graphical Models with Loops

- Computer ScienceNeural Computation
- 2000

An analytical relationship is derived between the probabilities computed using local propagation and the correct marginals and a category of graphical models with loops for which local propagation gives rise to provably optimal maximum a posteriori assignments (although the computed marginals will be incorrect).

### Scalable inference in latent variable models

- Computer ScienceWSDM '12
- 2012

A scalable parallel framework for efficient inference in latent variable models over streaming web-scale data by introducing a novel delta-based aggregation system with a bandwidth-efficient communication protocol, schedule-aware out-of-core storage, and approximate forward sampling to rapidly incorporate new data.

### Online Learning for Latent Dirichlet Allocation

- Computer ScienceNIPS
- 2010

An online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA) based on online stochastic optimization with a natural gradient step is developed, which shows converges to a local optimum of the VB objective function.

### Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images

- PhysicsIEEE Transactions on Pattern Analysis and Machine Intelligence
- 1984

The analogy between images and statistical mechanics systems is made and the analogous operation under the posterior distribution yields the maximum a posteriori (MAP) estimate of the image given the degraded observations, creating a highly parallel ``relaxation'' algorithm for MAP estimation.