• Corpus ID: 238583805

@article{Zhuang2021MomentumCA,
author={Juntang Zhuang and Yifan Ding and Tommy Tang and Nicha C. Dvornek and Sekhar C. Tatikonda and James S. Duncan},
journal={ArXiv},
year={2021},
volume={abs/2110.05454}
}
We propose ACProp (Asynchronous-centering-Prop), an adaptive optimizer which combines centering of second momentum and asynchronous update (e.g. for t-th update, denominator uses information up to step t − 1, while numerator uses gradient at t-th step). ACProp has both strong theoretical properties and empirical performance. With the example by Reddi et al. (2018), we show that asynchronous optimizers (e.g. AdaShift, ACProp) have weaker convergence condition than synchronous optimizers (e.g…

## References

SHOWING 1-10 OF 56 REFERENCES
A Sufficient Condition for Convergences of Adam and RMSProp
• Fangyu Zou, Wei Liu
• Computer Science, Mathematics
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
• 2019
An alternative easy-to-check sufficient condition is introduced, which merely depends on the parameters of the base learning rate and combinations of historical second-order moments, to guarantee the global convergence of generic Adam/RMSProp for solving large-scale non-convex stochastic optimization.
• Computer Science, Mathematics
ICLR
• 2019
AdaShift is proposed, a novel adaptive learning rate method that decorrelates v_t and g_t by temporal shifting, i.e., using temporally shifted gradient $g_{t-n}$ to calculate $v_t$.
• Computer Science, Mathematics
ICLR
• 2019
New variants of Adam and AMSGrad are provided, called AdaBound and AMSBound respectively, which employ dynamic bounds on learning rates to achieve a gradual and smooth transition from adaptive methods to SGD and give a theoretical proof of convergence.
Adam: A Method for Stochastic Optimization
• Computer Science, Mathematics
ICLR
• 2015
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
• Computer Science
NeurIPS
• 2018
The result implies that increasing minibatch sizes enables convergence, thus providing a way to circumvent the non-convergence issues, and provides a new adaptive optimization algorithm, Yogi, which controls the increase in effective learning rate, leading to even better performance with similar theoretical guarantees on convergence.
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
• Mathematics, Computer Science
NIPS
• 2017
This work proposes a two time-scale update rule (TTUR) for training GANs with stochastic gradient descent on arbitrary GAN loss functions and introduces the "Frechet Inception Distance" (FID) which captures the similarity of generated images to real ones better than the Inception Score.
Lookahead Optimizer: k steps forward, 1 step back
• Computer Science, Mathematics
NeurIPS
• 2019
Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost, and can significantly improve the performance of SGD and Adam, even with their default hyperparameter settings.
On the Variance of the Adaptive Learning Rate and Beyond
This work identifies a problem of the adaptive learning rate, suggests warmup works as a variance reduction technique, and proposes RAdam, a new variant of Adam, by introducing a term to rectify the variance of theadaptive learning rate.
On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization
• Computer Science, Mathematics
ICLR
• 2019
A set of mild sufficient conditions are provided that guarantee the convergence for the Adam-type methods and it is proved that under these derived conditions, these methods can achieve the convergence rate of order $O(\log{T}/\sqrt{T})$ for nonconvex stochastic optimization.