On the Convergence of AdaGrad with Momentum for Training Deep Neural Networks

@article{Zou2018OnTC,
  title={On the Convergence of AdaGrad with Momentum for Training Deep Neural Networks},
  author={Fangyu Zou and Li Shen},
  journal={CoRR},
  year={2018},
  volume={abs/1808.03408}
}
Adaptive stochastic gradient descent methods, such as AdaGrad, Adam, AdaDelta, Nadam, AMSGrad, etc., have been demonstrated efficacious in solving non-convex stochastic optimization, such as training deep neural networks. However, their convergence rates have not been touched under the non-convex stochastic circumstance except recent breakthrough results on AdaGrad [34] and perturbed AdaGrad [22]. In this paper, we propose two new adaptive stochastic gradient methods called AdaHB and AdaNAG… CONTINUE READING
Tweets
This paper has been referenced on Twitter 19 times. VIEW TWEETS