Corpus ID: 222377595

AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients

@article{Zhuang2020AdaBeliefOA,
  title={AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients},
  author={Juntang Zhuang and Tommy Tang and Yifan Ding and S. Tatikonda and N. Dvornek and X. Papademetris and J. Duncan},
  journal={ArXiv},
  year={2020},
  volume={abs/2010.07468}
}
Most popular optimizers for deep learning can be broadly categorized as adaptive methods (e.g. Adam) and accelerated schemes (e.g. stochastic gradient descent (SGD) with momentum). For many models such as convolutional neural networks (CNNs), adaptive methods typically converge faster but generalize worse compared to SGD; for complex settings such as generative adversarial networks (GANs), adaptive methods are typically the default because of their stability.We propose AdaBelief to… Expand
22 Citations
EAdam Optimizer: How $\epsilon$ Impact Adam
  • PDF
Short-Term Load Forecasting Based on Adabelief Optimized Temporal Convolutional Network and Gated Recurrent Unit Hybrid Neural Network
  • Highly Influenced
  • PDF
Generalizing Adversarial Examples by AdaBelief Optimizer
  • Highly Influenced
  • PDF
Heterogeneous Graph based Deep Learning for Biomedical Network Link Prediction
  • Highly Influenced
  • PDF
Opening the Blackbox: Accelerating Neural Differential Equations by Regularizing Internal Solver Heuristics
  • Highly Influenced
  • PDF
Adaptive Learning Rates with Maximum Variation Averaging
  • Highly Influenced
  • PDF
An AB-CNN intelligent fault location recognition model for induction motor
FastAdaBelief: Improving Convergence Rate for Belief-based Adaptive Optimizer by Strong Convexity
  • Yangfan Zhou, Kaizhu Huang, Cheng Cheng, Xuguang Wang, Xin Liu
  • Computer Science, Mathematics
  • ArXiv
  • 2021
  • PDF
...
1
2
3
...

References

SHOWING 1-10 OF 62 REFERENCES
Lookahead Optimizer: k steps forward, 1 step back
  • 199
  • PDF
Improving Generalization Performance by Switching from Adam to SGD
  • 226
  • PDF
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
  • 2,683
  • Highly Influential
  • PDF
On the Convergence of Adam and Beyond
  • 1,084
  • Highly Influential
  • PDF
The Marginal Value of Adaptive Gradient Methods in Machine Learning
  • 567
  • PDF
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
  • 71
  • Highly Influential
  • PDF
Nostalgic Adam: Weighing more of the past gradients when designing the adaptive learning rate
  • 22
  • PDF
Improved Training of Wasserstein GANs
  • 4,205
  • Highly Influential
  • PDF
signSGD: compressed optimisation for non-convex problems
  • 304
  • PDF
Adaptive Gradient Methods with Dynamic Bound of Learning Rate
  • 242
  • Highly Influential
  • PDF
...
1
2
3
4
5
...