• Corpus ID: 221103631

Non-convex Learning via Replica Exchange Stochastic Gradient MCMC

  title={Non-convex Learning via Replica Exchange Stochastic Gradient MCMC},
  author={Wei Deng and Qi Feng and Liyao (Mars) Gao and Faming Liang and Guang Lin},
  journal={Proceedings of machine learning research},
Replica exchange Monte Carlo (reMC), also known as parallel tempering, is an important technique for accelerating the convergence of the conventional Markov Chain Monte Carlo (MCMC) algorithms. However, such a method requires the evaluation of the energy function based on the full dataset and is not scalable to big data. The naïve implementation of reMC in mini-batch settings introduces large biases, which cannot be directly extended to the stochastic gradient MCMC (SGMCMC), the standard… 

Figures and Tables from this paper

Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction

The variance reduction for noisy energy estimators is studied, which promotes much more effective swaps inReplica exchange stochastic gradient Langevin dynamics and provides a non-asymptotic analysis on the exponential acceleration for the underlying continuous-time Markov jump process.

Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling

A novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain is provided and it is proved that under certain conditions on the target distribution, stochastic gradient evaluations suffice to guarantee $\epsilon$-sampling error in terms of the total variation distance.

A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

Theoretically, the CSGLD algorithm is proved to prove a stability condition and establish the asymptotic convergence of the self-adapting parameter to a unique fixed-point regardless of the non-convexity of the original energy function.

A New Framework for Variance-Reduced Hamiltonian Monte Carlo

Experimental results on both synthetic and real-world benchmark data show that the new framework of variance-reduced Hamiltonian Monte Carlo methods significantly outperforms the full gradient and stochastic gradient HMC approaches.

Stein Self-Repulsive Dynamics: Benefits From Past Samples

We propose a new Stein self-repulsive dynamics for obtaining diversified samples from intractable un-normalized distributions. Our idea is to introduce Stein variational gradient as a repulsive force

UPANets: Learning from the Universal Pixel Attention Neworks

This work proposes an efficient but robust backbone, which equips with channel and spatial direction attentions, so the attentions help to expand receptive fields in shallow convolutional layers and pass the information to every layer.

Interacting Contour Stochastic Gradient Langevin Dynamics

It is shown that ICSGLD can be theoretically more efficient than a single-chain CSGLD with an equivalent computational budget and a novel random-field function, which facilitates the estimation of self-adapting parameters in big data and obtains free mode explorations.

Multi-variance replica exchange stochastic gradient MCMC for inverse and forward Bayesian physics-informed neural network

The proposed multi-variance replica exchange stochastic gradient Langevin diffusion method is employed to train the Bayesian PINN to solve the forward and inverse problems and significantly lowers the computational cost in the high temperature chain, meanwhile preserves the accuracy and converges very fast.


  • 2022

Hessian-Free High-Resolution Nesterov Acceleration for Sampling

For (not-necessarily-strongly-) convex and $L$-smooth potentials, exponential convergence in $\chi^2$ divergence is proved, with a rate analogous to state-of-the-art results of underdamped Langevin dynamics, plus an additional acceleration.



Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics

This article proves that, under verifiable assumptions, the SGLD algorithm is consistent, satisfies a central limit theorem (CLT) and its asymptotic bias-variance decomposition can be characterized by an explicit functional of the step-sizes sequence (δm)m≥0.

Stochastic Quasi-Newton Langevin Monte Carlo

This study proposes a novel SG-MCMC method that takes the local geometry into account by using ideas from Quasi-Newton optimization methods, and achieves fast convergence rates similar to Riemannian approaches while at the same time having low computational requirements similar to diagonal preconditioning approaches.

A Complete Recipe for Stochastic Gradient MCMC

This paper provides a general recipe for constructing MCMC samplers--including stochastic gradient versions--based on continuous Markov processes specified via two matrices, and uses the recipe to straightforwardly propose a new state-adaptive sampler: stochastics gradient Riemann Hamiltonian Monte Carlo (SGRHMC).

On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators

This paper considers general SG-MCMCs with high-order integrators, and develops theory to analyze finite-time convergence properties and their asymptotic invariant measures.

Stochastic Gradient Hamiltonian Monte Carlo

A variant that uses second-order Langevin dynamics with a friction term that counteracts the effects of the noisy gradient, maintaining the desired target distribution as the invariant distribution is introduced.

Accelerating Nonconvex Learning via Replica Exchange Langevin diffusion

This work theoretically analyze the acceleration effect of replica exchange from two perspectives: the convergence in \chi^2-divergence, and the large deviation principle, and obtains a discrete-time algorithm by discretizing the replica exchange Langevin diffusion.

Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

This work proposes combining adaptive preconditioners with Stochastic Gradient Langevin Dynamics, and gives theoretical properties on asymptotic convergence and predictive risk, and empirical results for Logistic Regression, Feedforward Neural Nets, and Convolutional Neural Nets demonstrate that the preconditionsed SGLD method gives state-of-the-art performance.

Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis

The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks.

A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks

It is argued that the Gaussianity assumption might fail to hold in deep learning settings and hence render the Brownian motion-based analyses inappropriate and open up a different perspective and shed more light on the belief that SGD prefers wide minima.