Batch normalized recurrent neural networks

  title={Batch normalized recurrent neural networks},
  author={C{\'e}sar Laurent and G. Pereyra and Philemon Brakel and Y. Zhang and Yoshua Bengio},
  journal={2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  • César Laurent, G. Pereyra, +2 authors Yoshua Bengio
  • Published 2016
  • Computer Science, Mathematics
  • 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • Recurrent Neural Networks (RNNs) are powerful models for sequential data that have the potential to learn long-term dependencies. However, they are computationally expensive to train and difficult to parallelize. Recent work has shown that normalizing intermediate representations of neural networks can significantly improve convergence rates in feed-forward neural networks [1]. In particular, batch normalization, which uses mini-batch statistics to standardize features, was shown to… CONTINUE READING
    153 Citations
    Layer Normalization
    • 1,859
    • PDF
    Online Normalization for Training Neural Networks
    • 16
    • PDF
    Recurrent Batch Normalization
    • 304
    • PDF
    Layer-Normalized LSTM for Hybrid-Hmm and End-To-End ASR
    • 3
    • PDF
    A comprehensive study of batch construction strategies for recurrent neural networks in MXNet
    • 9
    • PDF
    Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN
    • 251
    • PDF
    Recurrent Residual Learning for Sequence Classification
    • 74
    • PDF


    Recurrent Neural Network Regularization
    • 1,616
    • Highly Influential
    • PDF
    Speech recognition with deep recurrent neural networks
    • 5,673
    • PDF
    Bidirectional recurrent neural networks
    • 3,832
    • Highly Influential
    • PDF
    Scaling recurrent neural network language models
    • 57
    • PDF
    How to Construct Deep Recurrent Neural Networks
    • 654
    • PDF
    Sequence to Sequence Learning with Neural Networks
    • 11,678
    • PDF
    Natural Neural Networks
    • 128
    • PDF
    Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
    • 21,339
    • Highly Influential
    • PDF
    Hybrid speech recognition with Deep Bidirectional LSTM
    • 1,067
    • PDF
    Dropout: a simple way to prevent neural networks from overfitting
    • 21,306
    • PDF