# Training RNNs as Fast as CNNs

@article{Lei2017TrainingRA, title={Training RNNs as Fast as CNNs}, author={Tao Lei and Yu Zhang and Yoav Artzi}, journal={ArXiv}, year={2017}, volume={abs/1709.02755} }

Common recurrent neural network architectures scale poorly due to the intrinsic difficulty in parallelizing their state computations. In this work, we propose the Simple Recurrent Unit (SRU) architecture, a recurrent unit that simplifies the computation and exposes more parallelism. In SRU, the majority of computation for each step is independent of the recurrence and can be easily parallelized. SRU is as fast as a convolutional layer and 5-10x faster than an optimized LSTM implementation. We…

## Figures and Tables from this paper

## 148 Citations

Efficient Sequence Learning with Group Recurrent Networks

- Computer ScienceNAACL
- 2018

An efficient architecture is proposed to improve the efficiency of such RNN model training, which adopts the group strategy for recurrent layers, while exploiting the representation rearrangement strategy between layers as well as time steps.

Sliced Recurrent Neural Networks

- Computer ScienceCOLING
- 2018

Cutting recurrent neural networks (SRNNs), which could be parallelized by slicing the sequences into many subsequences, are introduced and it is proved that the standard RNN is a special case of the SRNN when the authors use linear activation functions.

Rethinking Full Connectivity in Recurrent Neural Networks

- Computer ScienceArXiv
- 2019

Structurally sparse RNNs are studied, showing that they are well suited for acceleration on parallel hardware, with a greatly reduced cost of the recurrent operations as well as orders of magnitude less recurrent weights.

Wider and Deeper, Cheaper and Faster: Tensorized LSTMs for Sequence Learning

- Computer ScienceNIPS
- 2017

The Tensorized LSTM is proposed in which the hidden states are represented by tensors and updated via a cross-layer convolution and the potential of the proposed model is shown.

EcoRNN: Fused LSTM RNN Implementation with Data Layout Optimization

- Computer ScienceArXiv
- 2018

A new RNN implementation called EcoRNN is introduced that is significantly faster than the SOTA open-source implementation in MXNet and is competitive with the closed-source cuDNN and is integrated into MXNet Python library and open- source to benefit machine learning practitioners.

Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN

- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018

It is shown that an IndRNN can be easily regulated to prevent the gradient exploding and vanishing problems while allowing the network to learn long-term dependencies and work with non-saturated activation functions such as relu and be still trained robustly.

EcoRNN: Efficient Computing of LSTM RNN Training on GPUs

- Computer Science
- 2018

EcoRNN is proposed that incorporates two optimizations that significantly reduce the memory footprint and runtime and is transparent to programmers since EcoRNN automatically selects the best implementation using model hyperparameters.

Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks

- Computer ScienceICLR
- 2018

The Skip RNN model is introduced which extends existing RNN models by learning to skip state updates and shortens the effective size of the computational graph, which can reduce the number of required RNN updates while preserving, and sometimes even improving, the performance of the baseline Rnn models.

Optimization of Recurrent Neural Networks on Natural Language Processing

- Computer ScienceICCPR
- 2019

Investigating ways to further improve training speed and accuracy of RNNs in sentiment classification by combining methods of improving recurrent structure and recurrent units shows that not all combinations can result in an improvement, but rather significant improvements can be produced with the right arrangement of recurrent models.

Development of Recurrent Neural Networks and Its Applications to Activity Recognition

- Computer Science
- 2018

An independently recurrent neural network (IndRNN) is proposed to solve the gradient vanishing and exploding problem in the conventional RNNs and can learn very long-term patterns and can be stacked to construct very deep networks.

## References

SHOWING 1-10 OF 88 REFERENCES

Fast-Slow Recurrent Neural Networks

- Computer ScienceNIPS
- 2017

The approach is general as any kind of RNN cell is a possible building block for the FS-RNN architecture, and thus can be flexibly applied to different tasks.

genCNN: A Convolutional Architecture for Word Sequence Prediction

- Computer ScienceACL
- 2015

It is argued that the proposed novel convolutional architecture, named $gen$CNN, can give adequate representation of the history, and therefore can naturally exploit both the short and long range dependencies.

Language Modeling with Gated Convolutional Networks

- Computer ScienceICML
- 2017

A finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens, is developed and is the first time a non-recurrent approach is competitive with strong recurrent models on these large scale language tasks.

Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks

- Computer Science2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2015

This paper takes advantage of the complementarity of CNNs, LSTMs and DNNs by combining them into one unified architecture, and finds that the CLDNN provides a 4-6% relative improvement in WER over an LSTM, the strongest of the three individual models.

Optimizing Performance of Recurrent Neural Networks on GPUs

- Computer ScienceArXiv
- 2016

It is demonstrated that by exposing parallelism between operations within the network, an order of magnitude speedup across a range of network sizes can be achieved over a naive implementation.

Convolutional Sequence to Sequence Learning

- Computer ScienceICML
- 2017

This work introduces an architecture based entirely on convolutional neural networks, which outperform the accuracy of the deep LSTM setup of Wu et al. (2016) on both WMT'14 English-German and WMT-French translation at an order of magnitude faster speed, both on GPU and CPU.

Quasi-Recurrent Neural Networks

- Computer ScienceICLR
- 2017

Quasi-recurrent neural networks (QRNNs), an approach to neural sequence modeling that alternates convolutional layers, which apply in parallel across timesteps, and a minimalist recurrent pooling function that applies inallel across channels are introduced.

Persistent RNNs: Stashing Recurrent Weights On-Chip

- Computer ScienceICML
- 2016

This paper introduces a new technique for mapping Deep Recurrent Neural Networks efficiently onto GPUs that uses persistent computational kernels that exploit the GPU's inverted memory hierarchy to reuse network weights over multiple timesteps.

An introduction to computational networks and the computational network toolkit (invited talk)

- Computer ScienceINTERSPEECH
- 2014

The computational network toolkit (CNTK), an implementation of CN that supports both GPU and CPU, is introduced and the architecture and the key components of the CNTK are described, the command line options to use C NTK, and the network definition and model editing language are described.

Simplifying long short-term memory acoustic models for fast training and decoding

- Computer Science2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2016

To accelerate decoding of LSTMs, it is proposed to apply frame skipping during training, and frame skipping and posterior copying (FSPC) during decoding to resolve two challenges faced by LSTM models: high model complexity and poor decoding efficiency.