On the importance of initialization and momentum in deep learning

  • Published 2013

Abstract

Figure 2. The trajectories of CM, NAG, and SGD are shown. Although the value of the momentum is identical for both experiments, CM exhibits oscillations along the high-curvature directions, while NAG exhibits no such oscillations. The global minimizer of the objective is at (0,0). The red curve shows gradient descent with the same learning rate as NAG and CM, the blue curve shows NAG, and the green curve shows CM. See section 2 of the paper.

5 Figures and Tables

Cite this paper

@inproceedings{2013OnTI, title={On the importance of initialization and momentum in deep learning}, author={}, year={2013} }