Nostalgic Adam: Weighing more of the past gradients when designing the adaptive learning rate

@article{Huang2018NostalgicAW,
  title={Nostalgic Adam: Weighing more of the past gradients when designing the adaptive learning rate},
  author={Haiwen Huang and Chang Wang and Bin Dong},
  journal={CoRR},
  year={2018},
  volume={abs/1805.07557}
}
First-order optimization methods have been playing a prominent role in deep learning. Algorithms such as RMSProp and Adam are rather popular in training deep neural networks on large datasets. Recently, Reddi et al. [2018] discovered a flaw in the proof of convergence of Adam, and the authors proposed an alternative algorithm, AMSGrad, which has guaranteed convergence under certain conditions. In this paper, we propose a new algorithm, called Nostalgic Adam (NosAdam), which places bigger… CONTINUE READING
Tweets
This paper has been referenced on Twitter 7 times. VIEW TWEETS

References

Publications referenced by this paper.
SHOWING 1-10 OF 13 REFERENCES

Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude

  • T. Tieleman, G. Hinton
  • COURSERA: Neural Networks for Machine Learning,
  • 2012
Highly Influential
6 Excerpts

Kingma and Jimmy Ba . Adam : A method for stochastic optimization

  • P. Diederik
  • 2014

URL http://yann

  • Yann LeCun, Corinna Cortes. MNIST handwritten digit database.
  • lecun.com/exdb/mnist/.
  • 2010

Similar Papers

Loading similar papers…