• Corpus ID: 5056750

# Asynchronous Stochastic Gradient MCMC with Elastic Coupling

@article{Springenberg2016AsynchronousSG,
title={Asynchronous Stochastic Gradient MCMC with Elastic Coupling},
author={Jost Tobias Springenberg and Aaron Klein and Stefan Falkner and Frank Hutter},
journal={ArXiv},
year={2016},
volume={abs/1612.00767}
}
• Published 2 December 2016
• Computer Science
• ArXiv
We consider parallel asynchronous Markov Chain Monte Carlo (MCMC) sampling for problems where we can leverage (stochastic) gradients to define continuous dynamics which explore the target distribution. We outline a solution strategy for this setting based on stochastic gradient Hamiltonian Monte Carlo sampling (SGHMC) which we alter to include an elastic coupling term that ties together multiple MCMC instances. The proposed strategy turns inherently sequential HMC algorithms into asynchronous…
1 Citations

## Figures from this paper

### Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization

• Computer Science, Mathematics
ICML
• 2018
This study develops an asynchronous-parallel stochastic L-BFGS algorithm for non-convex optimization that achieves an ergodic convergence rate of ${\cal O}(1/\sqrt{N})$ (\$N) and it can achieve a linear speedup under certain conditions.

## References

SHOWING 1-10 OF 15 REFERENCES

### Parallel Stochastic Gradient Markov Chain Monte Carlo for Matrix Factorisation Models

• Computer Science
• 2015
A distributed Markov Chain Monte Carlo method based on stochastic gradient Langevin dynamics (SGLD), which achieves high performance by exploiting the conditional independence structure of the MF models to sub-sample data in a systematic manner as to allow parallelisation and distributed computation.

### A Complete Recipe for Stochastic Gradient MCMC

• Computer Science, Mathematics
NIPS
• 2015
This paper provides a general recipe for constructing MCMC samplers--including stochastic gradient versions--based on continuous Markov processes specified via two matrices, and uses the recipe to straightforwardly propose a new state-adaptive sampler: stochastics gradient Riemann Hamiltonian Monte Carlo (SGRHMC).

• Computer Science
ICML
• 2014
This work argues that stochastic gradient MCMC algorithms are particularly suited for distributed inference because individual chains can draw mini-batches from their local pool of data for a flexible amount of time before jumping to or syncing with other chains, which greatly reduces communication overhead and allows adaptive load balancing.

### Asymptotically Exact, Embarrassingly Parallel MCMC

• Computer Science
UAI
• 2014
This paper presents a parallel Markov chain Monte Carlo (MCMC) algorithm in which subsets of data are processed independently, with very little communication, and proves that it generates asymptotically exact samples and empirically demonstrate its ability to parallelize burn-in and sampling in several models.

• Computer Science
NIPS
• 2016
Stochastic gradient MCMC (SG-MCMC) has played an important role in large-scale Bayesian learning, with well-developed theoretical convergence properties. In such applications of SG-MCMC, it is

### Variational Consensus Monte Carlo

• Computer Science
NIPS
• 2015
Practitioners of Bayesian statistics have long depended on Markov chain Monte Carlo (MCMC) to obtain samples from intractable posterior distributions. Unfortunately, MCMC algorithms are typically

### Bayesian Learning via Stochastic Gradient Langevin Dynamics

• Computer Science
ICML
• 2011
In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic

### MCMC Using Hamiltonian Dynamics

Hamiltonian dynamics can be used to produce distant proposals for the Metropolis algorithm, thereby avoiding the slow exploration of the state space that results from the diffusive behaviour of

### Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC

• Ahn
• Computer Science
• 2015
This paper proposes a scalable distributed Bayesian matrix factorization algorithm, based on Distributed Stochastic Gradient Langevin Dynamics, that can not only match the prediction accuracy of standard MCMC methods like Gibbs sampling, but at the same time is as fast and simple as stochastic gradient descent.

### Deep learning with Elastic Averaging SGD

• Computer Science
NIPS
• 2015
Experiments demonstrate that the new algorithm accelerates the training of deep architectures compared to DOWNPOUR and other common baseline approaches and furthermore is very communication efficient.