Advances in optimizing recurrent networks

  title={Advances in optimizing recurrent networks},
  author={Yoshua Bengio and Nicolas Boulanger-Lewandowski and Razvan Pascanu},
  journal={2013 IEEE International Conference on Acoustics, Speech and Signal Processing},
  • Yoshua Bengio, Nicolas Boulanger-Lewandowski, Razvan Pascanu
  • Published 2013
  • Computer Science, Mathematics
  • 2013 IEEE International Conference on Acoustics, Speech and Signal Processing
  • After a more than decade-long period of relatively little research activity in the area of recurrent neural networks, several new developments will be reviewed here that have allowed substantial progress both in understanding and in technical solutions towards more efficient training of recurrent networks. These advances have been motivated by and related to the optimization issues surrounding deep learning. Although recurrent networks are extremely powerful in what they can in principle… CONTINUE READING
    375 Citations
    Learning Multiple Timescales in Recurrent Neural Networks
    • 7
    • PDF
    Learning Longer Memory in Recurrent Neural Networks
    • 190
    • PDF
    On Fast Dropout and its Applicability to Recurrent Networks
    • 50
    • Highly Influenced
    • PDF
    Sampling-Based Gradient Regularization for Capturing Long-Term Dependencies in Recurrent Neural Networks
    • 1
    • PDF
    Recent Advances in Recurrent Neural Networks
    • 141
    • PDF
    Regularizing Recurrent Networks - On Injected Noise and Norm-based Methods
    • 2
    • PDF
    Conditional Computation in Deep and Recurrent Neural Networks
    • 1
    Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
    • 4,904
    • PDF


    Learning Recurrent Neural Networks with Hessian-Free Optimization
    • 527
    • Highly Influential
    • PDF
    Extensions of recurrent neural network language model
    • 1,183
    • PDF
    Training recurrent neural networks
    • 289
    • Highly Influential
    Learning long-term dependencies with gradient descent is difficult
    • 4,739
    • PDF
    Greedy Layer-Wise Training of Deep Networks
    • 2,523
    • PDF
    Temporal-Kernel Recurrent Neural Networks
    • 39
    • PDF
    Context dependent recurrent neural network language model
    • Tomas Mikolov, G. Zweig
    • Computer Science
    • 2012 IEEE Spoken Language Technology Workshop (SLT)
    • 2012
    • 470
    • PDF
    Understanding the exploding gradient problem
    • 343
    • PDF
    Hierarchical Recurrent Neural Networks for Long-Term Dependencies
    • 280
    • PDF
    Why Does Unsupervised Pre-training Help Deep Learning?
    • 1,228
    • PDF