On the importance of initialization and momentum in deep learning
@inproceedings{Sutskever2013OnTI, title={On the importance of initialization and momentum in deep learning}, author={Ilya Sutskever and J. Martens and G. Dahl and Geoffrey E. Hinton}, booktitle={ICML}, year={2013} }
Deep and recurrent neural networks (DNNs and RNNs respectively) are powerful models that were considered to be almost impossible to train using stochastic gradient descent with momentum. In this paper, we show that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs (on datasets with long-term dependencies) to levels of performance that were… CONTINUE READING
Figures, Tables, and Topics from this paper
2,682 Citations
MomentumRNN: Integrating Momentum into Recurrent Neural Networks
- Computer Science, Mathematics
- NeurIPS
- 2020
- 1
- PDF
Stochastic Gradient Descent with Nonlinear Conjugate Gradient-Style Adaptive Momentum
- Computer Science, Mathematics
- ArXiv
- 2020
- Highly Influenced
- PDF
Accelerating Deep Neural Network Training with Inconsistent Stochastic Gradient Descent
- Mathematics, Computer Science
- Neural Networks
- 2017
- 35
- PDF
Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent
- Computer Science, Mathematics
- ArXiv
- 2020
- 5
- Highly Influenced
- PDF
Rethinking The Pid Optimizer For Stochastic Optimization Of Deep Networks
- Computer Science
- 2020 IEEE International Conference on Multimedia and Expo (ICME)
- 2020
- Highly Influenced
Towards Making Deep Transfer Learning Never Hurt
- Computer Science, Mathematics
- 2019 IEEE International Conference on Data Mining (ICDM)
- 2019
- 2
- PDF
On Fast Dropout and its Applicability to Recurrent Networks
- Computer Science, Mathematics
- ICLR
- 2014
- 50
- PDF
Random Walk Initialization for Training Very Deep Feedforward Networks
- Computer Science, Mathematics
- 2014
- 52
- PDF
References
SHOWING 1-10 OF 37 REFERENCES
Understanding the difficulty of training deep feedforward neural networks
- Computer Science, Mathematics
- AISTATS
- 2010
- 9,246
- Highly Influential
- PDF
Training Deep and Recurrent Networks with Hessian-Free Optimization
- Computer Science
- Neural Networks: Tricks of the Trade
- 2012
- 179
- PDF
Learning long-term dependencies with gradient descent is difficult
- Computer Science, Medicine
- IEEE Trans. Neural Networks
- 1994
- 4,894
- Highly Influential
- PDF