• Corpus ID: 3393165

Training RNNs as Fast as CNNs

@article{Lei2017TrainingRA,
  title={Training RNNs as Fast as CNNs},
  author={Tao Lei and Yu Zhang and Yoav Artzi},
  journal={ArXiv},
  year={2017},
  volume={abs/1709.02755}
}
Common recurrent neural network architectures scale poorly due to the intrinsic difficulty in parallelizing their state computations. In this work, we propose the Simple Recurrent Unit (SRU) architecture, a recurrent unit that simplifies the computation and exposes more parallelism. In SRU, the majority of computation for each step is independent of the recurrence and can be easily parallelized. SRU is as fast as a convolutional layer and 5-10x faster than an optimized LSTM implementation. We… 

Figures and Tables from this paper

Efficient Sequence Learning with Group Recurrent Networks
TLDR
An efficient architecture is proposed to improve the efficiency of such RNN model training, which adopts the group strategy for recurrent layers, while exploiting the representation rearrangement strategy between layers as well as time steps.
Sliced Recurrent Neural Networks
TLDR
Cutting recurrent neural networks (SRNNs), which could be parallelized by slicing the sequences into many subsequences, are introduced and it is proved that the standard RNN is a special case of the SRNN when the authors use linear activation functions.
Rethinking Full Connectivity in Recurrent Neural Networks
TLDR
Structurally sparse RNNs are studied, showing that they are well suited for acceleration on parallel hardware, with a greatly reduced cost of the recurrent operations as well as orders of magnitude less recurrent weights.
Wider and Deeper, Cheaper and Faster: Tensorized LSTMs for Sequence Learning
TLDR
The Tensorized LSTM is proposed in which the hidden states are represented by tensors and updated via a cross-layer convolution and the potential of the proposed model is shown.
EcoRNN: Fused LSTM RNN Implementation with Data Layout Optimization
TLDR
A new RNN implementation called EcoRNN is introduced that is significantly faster than the SOTA open-source implementation in MXNet and is competitive with the closed-source cuDNN and is integrated into MXNet Python library and open- source to benefit machine learning practitioners.
Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN
TLDR
It is shown that an IndRNN can be easily regulated to prevent the gradient exploding and vanishing problems while allowing the network to learn long-term dependencies and work with non-saturated activation functions such as relu and be still trained robustly.
EcoRNN: Efficient Computing of LSTM RNN Training on GPUs
TLDR
EcoRNN is proposed that incorporates two optimizations that significantly reduce the memory footprint and runtime and is transparent to programmers since EcoRNN automatically selects the best implementation using model hyperparameters.
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
TLDR
The Skip RNN model is introduced which extends existing RNN models by learning to skip state updates and shortens the effective size of the computational graph, which can reduce the number of required RNN updates while preserving, and sometimes even improving, the performance of the baseline Rnn models.
Optimization of Recurrent Neural Networks on Natural Language Processing
TLDR
Investigating ways to further improve training speed and accuracy of RNNs in sentiment classification by combining methods of improving recurrent structure and recurrent units shows that not all combinations can result in an improvement, but rather significant improvements can be produced with the right arrangement of recurrent models.
Development of Recurrent Neural Networks and Its Applications to Activity Recognition
TLDR
An independently recurrent neural network (IndRNN) is proposed to solve the gradient vanishing and exploding problem in the conventional RNNs and can learn very long-term patterns and can be stacked to construct very deep networks.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 88 REFERENCES
Fast-Slow Recurrent Neural Networks
TLDR
The approach is general as any kind of RNN cell is a possible building block for the FS-RNN architecture, and thus can be flexibly applied to different tasks.
genCNN: A Convolutional Architecture for Word Sequence Prediction
TLDR
It is argued that the proposed novel convolutional architecture, named $gen$CNN, can give adequate representation of the history, and therefore can naturally exploit both the short and long range dependencies.
Language Modeling with Gated Convolutional Networks
TLDR
A finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens, is developed and is the first time a non-recurrent approach is competitive with strong recurrent models on these large scale language tasks.
Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks
TLDR
This paper takes advantage of the complementarity of CNNs, LSTMs and DNNs by combining them into one unified architecture, and finds that the CLDNN provides a 4-6% relative improvement in WER over an LSTM, the strongest of the three individual models.
Optimizing Performance of Recurrent Neural Networks on GPUs
TLDR
It is demonstrated that by exposing parallelism between operations within the network, an order of magnitude speedup across a range of network sizes can be achieved over a naive implementation.
Convolutional Sequence to Sequence Learning
TLDR
This work introduces an architecture based entirely on convolutional neural networks, which outperform the accuracy of the deep LSTM setup of Wu et al. (2016) on both WMT'14 English-German and WMT-French translation at an order of magnitude faster speed, both on GPU and CPU.
Quasi-Recurrent Neural Networks
TLDR
Quasi-recurrent neural networks (QRNNs), an approach to neural sequence modeling that alternates convolutional layers, which apply in parallel across timesteps, and a minimalist recurrent pooling function that applies inallel across channels are introduced.
Persistent RNNs: Stashing Recurrent Weights On-Chip
TLDR
This paper introduces a new technique for mapping Deep Recurrent Neural Networks efficiently onto GPUs that uses persistent computational kernels that exploit the GPU's inverted memory hierarchy to reuse network weights over multiple timesteps.
An introduction to computational networks and the computational network toolkit (invited talk)
TLDR
The computational network toolkit (CNTK), an implementation of CN that supports both GPU and CPU, is introduced and the architecture and the key components of the CNTK are described, the command line options to use C NTK, and the network definition and model editing language are described.
Simplifying long short-term memory acoustic models for fast training and decoding
TLDR
To accelerate decoding of LSTMs, it is proposed to apply frame skipping during training, and frame skipping and posterior copying (FSPC) during decoding to resolve two challenges faced by LSTM models: high model complexity and poor decoding efficiency.
...
1
2
3
4
5
...