Corpus ID: 10940950

On the importance of initialization and momentum in deep learning

  title={On the importance of initialization and momentum in deep learning},
  author={Ilya Sutskever and J. Martens and G. Dahl and Geoffrey E. Hinton},
  • Ilya Sutskever, J. Martens, +1 author Geoffrey E. Hinton
  • Published in ICML 2013
  • Computer Science
  • Deep and recurrent neural networks (DNNs and RNNs respectively) are powerful models that were considered to be almost impossible to train using stochastic gradient descent with momentum. In this paper, we show that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs (on datasets with long-term dependencies) to levels of performance that were… CONTINUE READING
    2,682 Citations

    Figures, Tables, and Topics from this paper

    MomentumRNN: Integrating Momentum into Recurrent Neural Networks
    • 1
    • PDF
    Stochastic Gradient Descent with Nonlinear Conjugate Gradient-Style Adaptive Momentum
    • Highly Influenced
    • PDF
    Accelerating Deep Neural Network Training with Inconsistent Stochastic Gradient Descent
    • 35
    • PDF
    Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent
    • 5
    • Highly Influenced
    • PDF
    Rethinking The Pid Optimizer For Stochastic Optimization Of Deep Networks
    • Highly Influenced
    Towards Making Deep Transfer Learning
    Towards Making Deep Transfer Learning Never Hurt
    • 2
    • PDF
    A Novel Set Of Weight Initialization Techniques For Deep Learning Architectures
    On Fast Dropout and its Applicability to Recurrent Networks
    • 50
    • PDF
    Random Walk Initialization for Training Very Deep Feedforward Networks
    • 52
    • PDF


    Understanding the difficulty of training deep feedforward neural networks
    • 9,246
    • Highly Influential
    • PDF
    Learning Recurrent Neural Networks with Hessian-Free Optimization
    • 536
    • PDF
    Neural Networks: Tricks of the Trade
    • 882
    • PDF
    Generating Text with Recurrent Neural Networks
    • 1,049
    • PDF
    Greedy Layer-Wise Training of Deep Networks
    • 1,507
    Deep Learning Made Easier by Linear Transformations in Perceptrons
    • 163
    • PDF
    Training Deep and Recurrent Networks with Hessian-Free Optimization
    • 179
    • PDF
    Learning long-term dependencies with gradient descent is difficult
    • 4,894
    • Highly Influential
    • PDF
    Stochastic dynamics of learning with momentum in neural networks
    • 40
    • PDF
    Dynamics and algorithms for stochastic search
    • 12