# Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

@article{Chung2014EmpiricalEO, title={Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling}, author={Junyoung Chung and Çaglar G{\"u}lçehre and Kyunghyun Cho and Yoshua Bengio}, journal={ArXiv}, year={2014}, volume={abs/1412.3555} }

#### Supplemental Content

Presentation Slides

#### Paper Mentions

#### 6,420 Citations

Memory visualization for gated recurrent neural networks in speech recognition

- Computer Science
- 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017

Recurrent neural networks (RNNs) have shown clear superiority in sequence modeling, particularly the ones with gated units, such as long short-term memory (LSTM) and gated recurrent unit (GRU).… Expand

Evaluation of Gated Recurrent Neural Networks in Music Classification Tasks

- Computer Science
- ISAT
- 2017

A key result is a significant improvement of classification accuracy achieved by training the recurrent network on random short subsequences of the vector sequences in the training set. Expand

The Statistical Recurrent Unit

- Computer Science, Mathematics
- ICML
- 2017

The efficacy of SRUs as compared to LSTMs and GRUs is shown in an unbiased manner by optimizing respective architectures' hyperparameters for both synthetic and real-world tasks. Expand

Investigating gated recurrent neural networks for acoustic modeling

- Computer Science
- 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)
- 2016

GRU usually performs better than LSTM, for possibly GRU is able to modulate the previous memory content through the learned reset gates, helping to model the long-span dependence more efficiently for speech sequence and LSTMP shows comparable performance with GRU. Expand

GRUV : Algorithmic Music Generation using Recurrent Neural Networks

- 2015

We compare the performance of two different types of recurrent neural networks (RNNs) for the task of algorithmic music generation, with audio waveforms as input. In particular, we focus on RNNs that… Expand

Simplified minimal gated unit variations for recurrent neural networks

- Computer Science, Mathematics
- 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS)
- 2017

Three model variants of the minimal gated unit which further simplify that design by reducing the number of parameters in the forget-gate dynamic equation are introduced and shown similar accuracy to the MGU model while using fewer parameters and thus lower training expense. Expand

Residual Recurrent Neural Networks for Learning Sequential Representations

- Computer Science
- Inf.
- 2018

The results show that the RNN unit reformulate to learn the residual functions with reference to the hidden state gives state-of-the-art performance, outperforms LSTM and GRU layers in terms of speed, and supports an accuracy competitive with that of the other methods. Expand

Simplified gating in long short-term memory (LSTM) recurrent neural networks

- Computer Science, Mathematics
- 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS)
- 2017

Three new parameter-reduced variants obtained by eliminating combinations of the input signal, bias, and hidden unit signals from individual gating signals can achieve comparable performance to the standard LSTM model with less (adaptive) parameters. Expand

Sequence Modeling using Gated Recurrent Neural Networks

- Computer Science
- ArXiv
- 2015

This paper uses Recurrent Neural Networks to capture and model human motion data and generate motions by prediction of the next immediate data point at each time-step and demonstrates that this model is able to capture long-term dependencies in data and generated realistic motions. Expand

Going Wider: Recurrent Neural Network With Parallel Cells

- Computer Science
- ArXiv
- 2017

A simple technique called parallel cells (PCs) is proposed to enhance the learning ability of Recurrent Neural Network (RNN) by running multiple small RNN cells rather than one single large cell in each layer. Expand

#### References

SHOWING 1-10 OF 22 REFERENCES

Learning long-term dependencies with gradient descent is difficult

- Computer Science, Medicine
- IEEE Trans. Neural Networks
- 1994

This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods. Expand

Advances in optimizing recurrent networks

- Computer Science
- 2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013

Experiments reported here evaluate the use of clipping gradients, spanning longer time ranges with leaky integration, advanced momentum techniques, using more powerful output probability models, and encouraging sparser gradients to help symmetry breaking and credit assignment. Expand

Speech recognition with deep recurrent neural networks

- Computer Science
- 2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013

This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs. Expand

Sequence to Sequence Learning with Neural Networks

- Computer Science
- NIPS
- 2014

This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier. Expand

Generating Sequences With Recurrent Neural Networks

- Computer Science
- ArXiv
- 2013

This paper shows how Long Short-term Memory recurrent neural networks can be used to generate complex sequences with long-range structure, simply by predicting one data point at a time. The approach… Expand

Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription

- Computer Science, Mathematics
- ICML
- 2012

A probabilistic model based on distribution estimators conditioned on a recurrent neural network that is able to discover temporal dependencies in high-dimensional sequences that outperforms many traditional models of polyphonic music on a variety of realistic datasets is introduced. Expand

Supervised Sequence Labelling with Recurrent Neural Networks

- Computer Science
- Studies in Computational Intelligence
- 2008

A new type of output layer that allows recurrent networks to be trained directly for sequence labelling tasks where the alignment between the inputs and the labels is unknown, and an extension of the long short-term memory network architecture to multidimensional data, such as images and video sequences. Expand

On the difficulty of training recurrent neural networks

- Computer Science, Mathematics
- ICML
- 2013

This paper proposes a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradients problem and validates empirically the hypothesis and proposed solutions. Expand

Learning Recurrent Neural Networks with Hessian-Free Optimization

- Computer Science
- ICML
- 2011

This work solves the long-outstanding problem of how to effectively train recurrent neural networks on complex and difficult sequence modeling problems which may contain long-term data dependencies and offers a new interpretation of the generalized Gauss-Newton matrix of Schraudolph which is used within the HF approach of Martens. Expand

Learned-Norm Pooling for Deep Feedforward and Recurrent Neural Networks

- Computer Science, Mathematics
- ECML/PKDD
- 2014

It is shown that multilayer perceptrons (MLP) consisting of the L p units achieve the state-of-the-art results on a number of benchmark datasets and the proposed L p unit is evaluated on the recently proposed deep recurrent neural networks (RNN). Expand