# Non-convex Learning via Replica Exchange Stochastic Gradient MCMC

@article{Deng2020NonconvexLV, title={Non-convex Learning via Replica Exchange Stochastic Gradient MCMC}, author={Wei Deng and Qi Feng and Liyao (Mars) Gao and Faming Liang and Guang Lin}, journal={Proceedings of machine learning research}, year={2020}, volume={119}, pages={ 2474-2483 } }

Replica exchange Monte Carlo (reMC), also known as parallel tempering, is an important technique for accelerating the convergence of the conventional Markov Chain Monte Carlo (MCMC) algorithms. However, such a method requires the evaluation of the energy function based on the full dataset and is not scalable to big data. The naïve implementation of reMC in mini-batch settings introduces large biases, which cannot be directly extended to the stochastic gradient MCMC (SGMCMC), the standard…

## 14 Citations

### Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction

- Computer ScienceICLR
- 2021

The variance reduction for noisy energy estimators is studied, which promotes much more effective swaps inReplica exchange stochastic gradient Langevin dynamics and provides a non-asymptotic analysis on the exponential acceleration for the underlying continuous-time Markov jump process.

### Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling

- Computer Science, MathematicsUAI
- 2021

A novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain is provided and it is proved that under certain conditions on the target distribution, stochastic gradient evaluations suffice to guarantee $\epsilon$-sampling error in terms of the total variation distance.

### A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

- Computer ScienceNeurIPS
- 2020

Theoretically, the CSGLD algorithm is proved to prove a stability condition and establish the asymptotic convergence of the self-adapting parameter to a unique fixed-point regardless of the non-convexity of the original energy function.

### A New Framework for Variance-Reduced Hamiltonian Monte Carlo

- Computer ScienceArXiv
- 2021

Experimental results on both synthetic and real-world benchmark data show that the new framework of variance-reduced Hamiltonian Monte Carlo methods significantly outperforms the full gradient and stochastic gradient HMC approaches.

### Stein Self-Repulsive Dynamics: Benefits From Past Samples

- Computer ScienceNeurIPS
- 2020

We propose a new Stein self-repulsive dynamics for obtaining diversified samples from intractable un-normalized distributions. Our idea is to introduce Stein variational gradient as a repulsive force…

### UPANets: Learning from the Universal Pixel Attention Neworks

- Computer ScienceEntropy
- 2022

This work proposes an efficient but robust backbone, which equips with channel and spatial direction attentions, so the attentions help to expand receptive fields in shallow convolutional layers and pass the information to every layer.

### Interacting Contour Stochastic Gradient Langevin Dynamics

- Computer ScienceICLR
- 2022

It is shown that ICSGLD can be theoretically more efficient than a single-chain CSGLD with an equivalent computational budget and a novel random-field function, which facilitates the estimation of self-adapting parameters in big data and obtains free mode explorations.

### Multi-variance replica exchange stochastic gradient MCMC for inverse and forward Bayesian physics-informed neural network

- Computer ScienceArXiv
- 2021

The proposed multi-variance replica exchange stochastic gradient Langevin diffusion method is employed to train the Bayesian PINN to solve the forward and inverse problems and significantly lowers the computational cost in the high temperature chain, meanwhile preserves the accuracy and converges very fast.

### Hessian-Free High-Resolution Nesterov Acceleration for Sampling

- Computer ScienceICML
- 2022

For (not-necessarily-strongly-) convex and $L$-smooth potentials, exponential convergence in $\chi^2$ divergence is proved, with a rate analogous to state-of-the-art results of underdamped Langevin dynamics, plus an additional acceleration.

## References

SHOWING 1-10 OF 53 REFERENCES

### Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics

- Computer ScienceJ. Mach. Learn. Res.
- 2016

This article proves that, under verifiable assumptions, the SGLD algorithm is consistent, satisfies a central limit theorem (CLT) and its asymptotic bias-variance decomposition can be characterized by an explicit functional of the step-sizes sequence (δm)m≥0.

### Stochastic Quasi-Newton Langevin Monte Carlo

- Computer ScienceICML
- 2016

This study proposes a novel SG-MCMC method that takes the local geometry into account by using ideas from Quasi-Newton optimization methods, and achieves fast convergence rates similar to Riemannian approaches while at the same time having low computational requirements similar to diagonal preconditioning approaches.

### A Complete Recipe for Stochastic Gradient MCMC

- Computer Science, MathematicsNIPS
- 2015

This paper provides a general recipe for constructing MCMC samplers--including stochastic gradient versions--based on continuous Markov processes specified via two matrices, and uses the recipe to straightforwardly propose a new state-adaptive sampler: stochastics gradient Riemann Hamiltonian Monte Carlo (SGRHMC).

### On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators

- Computer ScienceNIPS
- 2015

This paper considers general SG-MCMCs with high-order integrators, and develops theory to analyze finite-time convergence properties and their asymptotic invariant measures.

### Stochastic Gradient Hamiltonian Monte Carlo

- Computer ScienceICML
- 2014

A variant that uses second-order Langevin dynamics with a friction term that counteracts the effects of the noisy gradient, maintaining the desired target distribution as the invariant distribution is introduced.

### Orthogonal parallel MCMC methods for sampling and optimization

- Computer ScienceDigit. Signal Process.
- 2016

### Accelerating Nonconvex Learning via Replica Exchange Langevin diffusion

- Computer ScienceICLR
- 2019

This work theoretically analyze the acceleration effect of replica exchange from two perspectives: the convergence in \chi^2-divergence, and the large deviation principle, and obtains a discrete-time algorithm by discretizing the replica exchange Langevin diffusion.

### Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

- Computer ScienceAAAI
- 2016

This work proposes combining adaptive preconditioners with Stochastic Gradient Langevin Dynamics, and gives theoretical properties on asymptotic convergence and predictive risk, and empirical results for Logistic Regression, Feedforward Neural Nets, and Convolutional Neural Nets demonstrate that the preconditionsed SGLD method gives state-of-the-art performance.

### Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis

- Computer Science, MathematicsCOLT
- 2017

The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks.

### A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks

- Computer ScienceICML
- 2019

It is argued that the Gaussianity assumption might fail to hold in deep learning settings and hence render the Brownian motion-based analyses inappropriate and open up a different perspective and shed more light on the belief that SGD prefers wide minima.