• Corpus ID: 5056750

Asynchronous Stochastic Gradient MCMC with Elastic Coupling

@article{Springenberg2016AsynchronousSG,
  title={Asynchronous Stochastic Gradient MCMC with Elastic Coupling},
  author={Jost Tobias Springenberg and Aaron Klein and Stefan Falkner and Frank Hutter},
  journal={ArXiv},
  year={2016},
  volume={abs/1612.00767}
}
We consider parallel asynchronous Markov Chain Monte Carlo (MCMC) sampling for problems where we can leverage (stochastic) gradients to define continuous dynamics which explore the target distribution. We outline a solution strategy for this setting based on stochastic gradient Hamiltonian Monte Carlo sampling (SGHMC) which we alter to include an elastic coupling term that ties together multiple MCMC instances. The proposed strategy turns inherently sequential HMC algorithms into asynchronous… 

Figures from this paper

Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization

TLDR
This study develops an asynchronous-parallel stochastic L-BFGS algorithm for non-convex optimization that achieves an ergodic convergence rate of ${\cal O}(1/\sqrt{N})$ ($N) and it can achieve a linear speedup under certain conditions.

References

SHOWING 1-10 OF 15 REFERENCES

Parallel Stochastic Gradient Markov Chain Monte Carlo for Matrix Factorisation Models

TLDR
A distributed Markov Chain Monte Carlo method based on stochastic gradient Langevin dynamics (SGLD), which achieves high performance by exploiting the conditional independence structure of the MF models to sub-sample data in a systematic manner as to allow parallelisation and distributed computation.

A Complete Recipe for Stochastic Gradient MCMC

TLDR
This paper provides a general recipe for constructing MCMC samplers--including stochastic gradient versions--based on continuous Markov processes specified via two matrices, and uses the recipe to straightforwardly propose a new state-adaptive sampler: stochastics gradient Riemann Hamiltonian Monte Carlo (SGRHMC).

Distributed Stochastic Gradient MCMC

TLDR
This work argues that stochastic gradient MCMC algorithms are particularly suited for distributed inference because individual chains can draw mini-batches from their local pool of data for a flexible amount of time before jumping to or syncing with other chains, which greatly reduces communication overhead and allows adaptive load balancing.

Asymptotically Exact, Embarrassingly Parallel MCMC

TLDR
This paper presents a parallel Markov chain Monte Carlo (MCMC) algorithm in which subsets of data are processed independently, with very little communication, and proves that it generates asymptotically exact samples and empirically demonstrate its ability to parallelize burn-in and sampling in several models.

Stochastic Gradient MCMC with Stale Gradients

Stochastic gradient MCMC (SG-MCMC) has played an important role in large-scale Bayesian learning, with well-developed theoretical convergence properties. In such applications of SG-MCMC, it is

Variational Consensus Monte Carlo

Practitioners of Bayesian statistics have long depended on Markov chain Monte Carlo (MCMC) to obtain samples from intractable posterior distributions. Unfortunately, MCMC algorithms are typically

Bayesian Learning via Stochastic Gradient Langevin Dynamics

In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic

MCMC Using Hamiltonian Dynamics

Hamiltonian dynamics can be used to produce distant proposals for the Metropolis algorithm, thereby avoiding the slow exploration of the state space that results from the diffusive behaviour of

Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC

  • Ahn
  • Computer Science
  • 2015
TLDR
This paper proposes a scalable distributed Bayesian matrix factorization algorithm, based on Distributed Stochastic Gradient Langevin Dynamics, that can not only match the prediction accuracy of standard MCMC methods like Gibbs sampling, but at the same time is as fast and simple as stochastic gradient descent.

Deep learning with Elastic Averaging SGD

TLDR
Experiments demonstrate that the new algorithm accelerates the training of deep architectures compared to DOWNPOUR and other common baseline approaches and furthermore is very communication efficient.