• Corpus ID: 221103631

# Non-convex Learning via Replica Exchange Stochastic Gradient MCMC

@article{Deng2020NonconvexLV,
title={Non-convex Learning via Replica Exchange Stochastic Gradient MCMC},
author={Wei Deng and Qi Feng and Liyao (Mars) Gao and Faming Liang and Guang Lin},
journal={Proceedings of machine learning research},
year={2020},
volume={119},
pages={
2474-2483
}
}
• Published 1 July 2020
• Computer Science
• Proceedings of machine learning research
Replica exchange Monte Carlo (reMC), also known as parallel tempering, is an important technique for accelerating the convergence of the conventional Markov Chain Monte Carlo (MCMC) algorithms. However, such a method requires the evaluation of the energy function based on the full dataset and is not scalable to big data. The naïve implementation of reMC in mini-batch settings introduces large biases, which cannot be directly extended to the stochastic gradient MCMC (SGMCMC), the standard…

## Figures and Tables from this paper

### Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction

• Computer Science
ICLR
• 2021
The variance reduction for noisy energy estimators is studied, which promotes much more effective swaps inReplica exchange stochastic gradient Langevin dynamics and provides a non-asymptotic analysis on the exponential acceleration for the underlying continuous-time Markov jump process.

### Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling

• Computer Science, Mathematics
UAI
• 2021
A novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain is provided and it is proved that under certain conditions on the target distribution, stochastic gradient evaluations suffice to guarantee $\epsilon$-sampling error in terms of the total variation distance.

### A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

• Computer Science
NeurIPS
• 2020
Theoretically, the CSGLD algorithm is proved to prove a stability condition and establish the asymptotic convergence of the self-adapting parameter to a unique fixed-point regardless of the non-convexity of the original energy function.

### A New Framework for Variance-Reduced Hamiltonian Monte Carlo

• Computer Science
ArXiv
• 2021
Experimental results on both synthetic and real-world benchmark data show that the new framework of variance-reduced Hamiltonian Monte Carlo methods significantly outperforms the full gradient and stochastic gradient HMC approaches.

### Stein Self-Repulsive Dynamics: Benefits From Past Samples

• Computer Science
NeurIPS
• 2020
We propose a new Stein self-repulsive dynamics for obtaining diversified samples from intractable un-normalized distributions. Our idea is to introduce Stein variational gradient as a repulsive force

### UPANets: Learning from the Universal Pixel Attention Neworks

• Computer Science
Entropy
• 2022
This work proposes an efficient but robust backbone, which equips with channel and spatial direction attentions, so the attentions help to expand receptive fields in shallow convolutional layers and pass the information to every layer.

### Interacting Contour Stochastic Gradient Langevin Dynamics

• Computer Science
ICLR
• 2022
It is shown that ICSGLD can be theoretically more efficient than a single-chain CSGLD with an equivalent computational budget and a novel random-field function, which facilitates the estimation of self-adapting parameters in big data and obtains free mode explorations.

### Multi-variance replica exchange stochastic gradient MCMC for inverse and forward Bayesian physics-informed neural network

• Computer Science
ArXiv
• 2021
The proposed multi-variance replica exchange stochastic gradient Langevin diffusion method is employed to train the Bayesian PINN to solve the forward and inverse problems and significantly lowers the computational cost in the high temperature chain, meanwhile preserves the accuracy and converges very fast.

### Hessian-Free High-Resolution Nesterov Acceleration for Sampling

• Computer Science
ICML
• 2022
For (not-necessarily-strongly-) convex and $L$-smooth potentials, exponential convergence in $\chi^2$ divergence is proved, with a rate analogous to state-of-the-art results of underdamped Langevin dynamics, plus an additional acceleration.

## References

SHOWING 1-10 OF 53 REFERENCES

### Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics

• Computer Science
J. Mach. Learn. Res.
• 2016
This article proves that, under verifiable assumptions, the SGLD algorithm is consistent, satisfies a central limit theorem (CLT) and its asymptotic bias-variance decomposition can be characterized by an explicit functional of the step-sizes sequence (δm)m≥0.

### Stochastic Quasi-Newton Langevin Monte Carlo

• Computer Science
ICML
• 2016
This study proposes a novel SG-MCMC method that takes the local geometry into account by using ideas from Quasi-Newton optimization methods, and achieves fast convergence rates similar to Riemannian approaches while at the same time having low computational requirements similar to diagonal preconditioning approaches.

### A Complete Recipe for Stochastic Gradient MCMC

• Computer Science, Mathematics
NIPS
• 2015
This paper provides a general recipe for constructing MCMC samplers--including stochastic gradient versions--based on continuous Markov processes specified via two matrices, and uses the recipe to straightforwardly propose a new state-adaptive sampler: stochastics gradient Riemann Hamiltonian Monte Carlo (SGRHMC).

### On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators

• Computer Science
NIPS
• 2015
This paper considers general SG-MCMCs with high-order integrators, and develops theory to analyze finite-time convergence properties and their asymptotic invariant measures.

### Stochastic Gradient Hamiltonian Monte Carlo

• Computer Science
ICML
• 2014
A variant that uses second-order Langevin dynamics with a friction term that counteracts the effects of the noisy gradient, maintaining the desired target distribution as the invariant distribution is introduced.

### Accelerating Nonconvex Learning via Replica Exchange Langevin diffusion

• Yi ChenJing Dong
• Computer Science
ICLR
• 2019
This work theoretically analyze the acceleration effect of replica exchange from two perspectives: the convergence in \chi^2-divergence, and the large deviation principle, and obtains a discrete-time algorithm by discretizing the replica exchange Langevin diffusion.

### Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

• Computer Science
AAAI
• 2016
This work proposes combining adaptive preconditioners with Stochastic Gradient Langevin Dynamics, and gives theoretical properties on asymptotic convergence and predictive risk, and empirical results for Logistic Regression, Feedforward Neural Nets, and Convolutional Neural Nets demonstrate that the preconditionsed SGLD method gives state-of-the-art performance.

### Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis

• Computer Science, Mathematics
COLT
• 2017
The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks.

### A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks

• Computer Science
ICML
• 2019
It is argued that the Gaussianity assumption might fail to hold in deep learning settings and hence render the Brownian motion-based analyses inappropriate and open up a different perspective and shed more light on the belief that SGD prefers wide minima.