# Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networks

@article{Lim2021PolygonalUL, title={Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networks}, author={Dong-Young Lim and Sotirios Sabanis}, journal={ArXiv}, year={2021}, volume={abs/2105.13937} }

We present a new class of Langevin based algorithms, which overcomes many of the known shortcomings of popular adaptive optimizers that are currently used for the fine tuning of deep learning models. Its underpinning theory relies on recent advances of Euler’s polygonal approximations for stochastic differential equations (SDEs) with monotone coefficients. As a result, it inherits the stability properties of tamed algorithms, while it addresses other known issues, e.g. vanishing gradients in…

## 4 Citations

### Langevin dynamics based algorithm e-THεO POULA for stochastic optimization problems with discontinuous stochastic gradient

- Computer ScienceArXiv
- 2022

A BSTRACT . We introduce a new Langevin dynamics based algorithm, called e-TH ε O POULA, to solve optimization problems with discontinuous stochastic gradients which naturally appear in real-world…

### Langevin dynamics based algorithm e-TH$\varepsilon$O POULA for stochastic optimization problems with discontinuous stochastic gradient

- Computer Science
- 2022

A BSTRACT . We introduce a new Langevin dynamics based algorithm, called e-TH ε O POULA, to solve optimization problems with discontinuous stochastic gradients which naturally appear in real-world…

### Global convergence of optimized adaptive importance samplers

- Computer Science
- 2022

This work utilizes stochastic gradient Langevin dynamics and its underdamped counterpart for the global optimization of χ-divergence and derive nonasymptotic bounds for the MSE by leveraging recent results from non-convex optimization literature.

### Existence, uniqueness and approximation of solutions of SDEs with superlinear coefficients in the presence of discontinuities of the drift coefficient

- Mathematics
- 2022

. Existence, uniqueness, and L p -approximation results are presented for scalar stochastic diﬀerential equations (SDEs) by considering the case where, the drift coeﬃcient has ﬁnitely many spatial…

## References

SHOWING 1-10 OF 45 REFERENCES

### Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

- Computer ScienceJ. Mach. Learn. Res.
- 2011

This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.

### Taming neural networks with TUSLA: Non-convex learning via adaptive stochastic gradient Langevin algorithms

- Computer ScienceArXiv
- 2020

This work offers a new learning algorithm based on an appropriately constructed variant of the popular stochastic gradient Langevin dynamics (SGLD), which is called tamed unadjusted Stochastic Langevin algorithm (TUSLA), and provides finite-time guarantees for TUSLA to find approximate minimizers of both empirical and population risks.

### On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization

- Computer ScienceICLR
- 2019

A set of mild sufficient conditions are provided that guarantee the convergence for the Adam-type methods and it is proved that under these derived conditions, these methods can achieve the convergence rate of order $O(\log{T}/\sqrt{T})$ for nonconvex stochastic optimization.

### Adam: A Method for Stochastic Optimization

- Computer ScienceICLR
- 2015

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

### The Marginal Value of Adaptive Gradient Methods in Machine Learning

- Computer ScienceNIPS
- 2017

It is observed that the solutions found by adaptive methods generalize worse (often significantly worse) than SGD, even when these solutions have better training performance, suggesting that practitioners should reconsider the use of adaptive methods to train neural networks.

### Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis

- Computer Science, MathematicsCOLT
- 2017

The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks.

### Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization

- Computer ScienceNeurIPS
- 2018

For the first time, it is proved that the global convergence guarantee for variance reduced stochastic gradient Langevin dynamics (VR-SGLD) to the almost minimizer after $\tilde O\big(\sqrt{n}d^5/(\lambda^4\epsilon^{5/2})\big)$ stoChastic gradient evaluations, which outperforms the gradient complexities of GLD and SGLD in a wide regime.

### Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm

- Mathematics, Computer Science
- 2015

For both constant and decreasing step sizes in the Euler discretization, non-asymptotic bounds for the convergence to the target distribution $\pi$ in total variation distance are obtained.

### Global Non-convex Optimization with Discretized Diffusions

- Computer Science, MathematicsNeurIPS
- 2018

This non-asymptotic analysis delivers computable optimization and integration error bounds based on easily accessed properties of the objective and chosen diffusion and complements these results with improved optimization guarantees for targets other than the standard Gibbs measure.

### Nonasymptotic Estimates for Stochastic Gradient Langevin Dynamics Under Local Conditions in Nonconvex Optimization

- Computer ScienceApplied Mathematics & Optimization
- 2023

The Wasserstein-2 convergence result can be applied to establish a non-asymptotic error bound for the expected excess risk and the importance of this relaxation is illustrated by presenting examples from variational inference and from index tracking optimization.