• Corpus ID: 4781478

Stochastic Gradient Monomial Gamma Sampler

@inproceedings{Zhang2017StochasticGM,
  title={Stochastic Gradient Monomial Gamma Sampler},
  author={Yizhe Zhang and Changyou Chen and Zhe Gan and Ricardo Henao and Lawrence Carin},
  booktitle={ICML},
  year={2017}
}
Recent advances in stochastic gradient techniques have made it possible to estimate posterior distributions from large datasets via Markov Chain Monte Carlo (MCMC). However, when the target posterior is multimodal, mixing performance is often poor. This results in inadequate exploration of the posterior distribution. A framework is proposed to improve the sampling efficiency of stochastic gradient MCMC, based on Hamiltonian Monte Carlo. A generalized kinetic function is leveraged, delivering… 

Figures and Tables from this paper

A convergence analysis for a class of practical variance-reduction stochastic gradient MCMC
TLDR
This paper proves that at the beginning of an SG-MCMC algorithm, a larger minibatch size leads to a faster decrease of the mean squared error bound, and develops the theory to prove that the algorithm induces a faster convergence rate than standard SG- MCMC.
AMAGOLD: Amortized Metropolis Adjustment for Efficient Stochastic Gradient MCMC
TLDR
This work proposes a novel second-order SG-MCMC algorithm---AMAGOLD---that infrequently uses Metropolis-Hastings (M-H) corrections to remove bias and proves AMAGOLD converges to the target distribution with a fixed, rather than a diminishing, step size, and that its convergence rate is at most a constant factor slower than a full-batch baseline.
Modified Hamiltonian Monte Carlo for Bayesian inference
TLDR
It is shown that performance of HMC can be significantly improved by incorporating importance sampling and an irreversible part of the dynamics into a chain, and is called Mix & Match Hamiltonian Monte Carlo (MMHMC).
Learning Structural Weight Uncertainty for Sequential Decision-Making
TLDR
This work proposes efficient posterior learning of structural weight uncertainty, within an SVGD framework, by employing matrix variate Gaussian priors on NN parameters, and investigates the learned structural uncertainty in sequential decision-making problems, including contextual bandits and reinforcement learning.
Understanding MCMC Dynamics as Flows on the Wasserstein Space
TLDR
A theoretical framework is proposed that recognizes a general MCMC dynamics as the fiber-gradient Hamiltonian flow on the Wasserstein space of a fiber-Riemannian Poisson manifold and enables ParVI simulation of MCMC Dynamics, which enriches the ParVI family with more efficient dynamics, and also adapts ParVI advantages to MCMCs.
Continuous-Time Flows for Deep Generative Models
TLDR
This paper discretizes the CTF to make training feasible, and develops theory on the approximation error, which is then adopted to distill knowledge from a C TF to an efficient inference network.
Continuous-Time Flows for Efficient Inference and Density Estimation
TLDR
This paper proposes the concept of continuous-time flows (CTFs), a family of diffusion-based methods that are able to asymptotically approach a target distribution and demonstrates promising performance of the proposed CTF framework, compared to related techniques.
A Continuous Mapping For Augmentation Design
TLDR
This work poses the ADA as a continuous optimization problem over the parameters of the augmentation distribution; and uses Stochastic Gradient Langevin Dynamics to learn and sample augmentations, which opens avenues for utilizing the vast efficient gradient-based algorithms available for continuous optimization problems.
Improving human mobility identification with trajectory augmentation
TLDR
A Trajectory Generative Adversarial Network (TGAN) is introduced as an approach to enable learning users motion patterns and location distribution, and to eventually identify human mobility.
Improving Human Action Recognition through Hierarchical Neural Network Classifiers
TLDR
This work implements a CNN-based hierarchical recognition approach to recognize 20 most difficult-to-recognize actions from the Kinetics dataset and shows that the application of the method significantly improves the quality of recognition for these actions.
...
...

References

SHOWING 1-10 OF 43 REFERENCES
A Complete Recipe for Stochastic Gradient MCMC
TLDR
This paper provides a general recipe for constructing MCMC samplers--including stochastic gradient versions--based on continuous Markov processes specified via two matrices, and uses the recipe to straightforwardly propose a new state-adaptive sampler: stochastics gradient Riemann Hamiltonian Monte Carlo (SGRHMC).
Bayesian Learning via Stochastic Gradient Langevin Dynamics
In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic
Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization
TLDR
The theoretical analysis suggests the proposed simulated annealing approach converges close to the global optima, and recent SG-MCMC methods are extended with two key components: i) adaptive preconditioners (as in ADAgrad or RMSprop), and ii) adaptive element-wise momentum weights.
Bayesian Sampling Using Stochastic Gradient Thermostats
TLDR
This work shows that one can leverage a small number of additional variables to stabilize momentum fluctuations induced by the unknown noise inynamics-based sampling methods.
High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models
TLDR
Extensive experiments on two canonical models and their deep extensions demonstrate that the proposed scheme improves general Bayesian posterior sampling, particularly for deep models, and is more accurate, robust, and converges faster.
Stochastic Gradient Hamiltonian Monte Carlo
TLDR
A variant that uses second-order Langevin dynamics with a friction term that counteracts the effects of the noisy gradient, maintaining the desired target distribution as the invariant distribution is introduced.
Relativistic Monte Carlo
TLDR
Relativistic Hamiltonian Monte Carlo is proposed, a version of HMC based on relativistic dynamics that introduce a maximum velocity on particles that derives stochastic gradient versions of the algorithm and shows that the resulting algorithms bear interesting relationships to gradient clipping, RMSprop, Adagrad and Adam, popular optimisation methods in deep learning.
On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators
TLDR
This paper considers general SG-MCMCs with high-order integrators, and develops theory to analyze finite-time convergence properties and their asymptotic invariant measures.
Exploration of the (Non-)Asymptotic Bias and Variance of Stochastic Gradient Langevin Dynamics
TLDR
A modified SGLD which removes the asymptotic bias due to the variance of the stochastic gradients up to first order in the step size is derived and bounds on the finite-time bias, variance and mean squared error are obtained.
Finite-Time Analysis of Projected Langevin Monte Carlo
TLDR
It is shown that LMC allows to sample in polynomial time from a posterior distribution restricted to a convex body and with concave log-likelihood, which gives the first Markov chain to sample from a log-concave distribution with a first-order oracle.
...
...