• Corpus ID: 235248142

# Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networks

@article{Lim2021PolygonalUL,
title={Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networks},
author={Dong-Young Lim and Sotirios Sabanis},
journal={ArXiv},
year={2021},
volume={abs/2105.13937}
}
• Published 28 May 2021
• Computer Science
• ArXiv
We present a new class of Langevin based algorithms, which overcomes many of the known shortcomings of popular adaptive optimizers that are currently used for the fine tuning of deep learning models. Its underpinning theory relies on recent advances of Euler’s polygonal approximations for stochastic differential equations (SDEs) with monotone coefficients. As a result, it inherits the stability properties of tamed algorithms, while it addresses other known issues, e.g. vanishing gradients in…
4 Citations

## Figures and Tables from this paper

• Computer Science
ArXiv
• 2022
A BSTRACT . We introduce a new Langevin dynamics based algorithm, called e-TH ε O POULA, to solve optimization problems with discontinuous stochastic gradients which naturally appear in real-world
• Computer Science
• 2022
A BSTRACT . We introduce a new Langevin dynamics based algorithm, called e-TH ε O POULA, to solve optimization problems with discontinuous stochastic gradients which naturally appear in real-world
This work utilizes stochastic gradient Langevin dynamics and its underdamped counterpart for the global optimization of χ-divergence and derive nonasymptotic bounds for the MSE by leveraging recent results from non-convex optimization literature.
• Mathematics
• 2022
. Existence, uniqueness, and L p -approximation results are presented for scalar stochastic diﬀerential equations (SDEs) by considering the case where, the drift coeﬃcient has ﬁnitely many spatial

## References

SHOWING 1-10 OF 45 REFERENCES

• Computer Science
J. Mach. Learn. Res.
• 2011
This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.
• Computer Science
ArXiv
• 2020
This work offers a new learning algorithm based on an appropriately constructed variant of the popular stochastic gradient Langevin dynamics (SGLD), which is called tamed unadjusted Stochastic Langevin algorithm (TUSLA), and provides finite-time guarantees for TUSLA to find approximate minimizers of both empirical and population risks.
• Computer Science
ICLR
• 2019
A set of mild sufficient conditions are provided that guarantee the convergence for the Adam-type methods and it is proved that under these derived conditions, these methods can achieve the convergence rate of order $O(\log{T}/\sqrt{T})$ for nonconvex stochastic optimization.
• Computer Science
ICLR
• 2015
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
• Computer Science
NIPS
• 2017
It is observed that the solutions found by adaptive methods generalize worse (often significantly worse) than SGD, even when these solutions have better training performance, suggesting that practitioners should reconsider the use of adaptive methods to train neural networks.
• Computer Science, Mathematics
COLT
• 2017
The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks.
• Computer Science
NeurIPS
• 2018
For the first time, it is proved that the global convergence guarantee for variance reduced stochastic gradient Langevin dynamics (VR-SGLD) to the almost minimizer after $\tilde O\big(\sqrt{n}d^5/(\lambda^4\epsilon^{5/2})\big)$ stoChastic gradient evaluations, which outperforms the gradient complexities of GLD and SGLD in a wide regime.
• Mathematics, Computer Science
• 2015
For both constant and decreasing step sizes in the Euler discretization, non-asymptotic bounds for the convergence to the target distribution $\pi$ in total variation distance are obtained.
• Computer Science, Mathematics
NeurIPS
• 2018
This non-asymptotic analysis delivers computable optimization and integration error bounds based on easily accessed properties of the objective and chosen diffusion and complements these results with improved optimization guarantees for targets other than the standard Gibbs measure.
• Computer Science
Applied Mathematics & Optimization
• 2023
The Wasserstein-2 convergence result can be applied to establish a non-asymptotic error bound for the expected excess risk and the importance of this relaxation is illustrated by presenting examples from variational inference and from index tracking optimization.