• Corpus ID: 9668607

An Empirical Exploration of Recurrent Network Architectures

@inproceedings{Jzefowicz2015AnEE,
  title={An Empirical Exploration of Recurrent Network Architectures},
  author={Rafal J{\'o}zefowicz and Wojciech Zaremba and Ilya Sutskever},
  booktitle={ICML},
  year={2015}
}
The Recurrent Neural Network (RNN) is an extremely powerful sequence model that is often difficult to train. [] Key Result We found that adding a bias of 1 to the LSTM's forget gate closes the gap between the LSTM and the GRU.

Figures and Tables from this paper

Minimal gated unit for recurrent neural networks

TLDR
This work proposes a gated unit for RNN, named as minimal gated units (MGU), since it only contains one gate, which is a minimal design among all gated hidden units.

Radically Simplifying Gated Recurrent Architectures Without Loss of Performance

TLDR
This study demonstrates that it is possible to radically simplify the MGU without significant loss of performance for some tasks and datasets, and an extraordinarily simple Forget Gate architecture performs just as well as an MGU on the given task.

Restricted Recurrent Neural Networks

TLDR
Experiments on natural language modeling show that compared with its classical counterpart, the restricted recurrent architecture generally produces comparable results at about 50% compression rate, and in particular, the Restricted LSTM can outperform classical RNN with even less number of parameters.

Discovering Gated Recurrent Neural Network Architectures

TLDR
This chapter proposes a new method, evolution of a tree-based encoding of the gated memory nodes, and shows that it makes it possible to explore new variations more effectively than other methods, and discovers nodes with multiple recurrent paths and multiple memory cells, which lead to significant improvement in the standard language modeling benchmark task.

Investigating gated recurrent neural networks for acoustic modeling

TLDR
GRU usually performs better than LSTM, for possibly GRU is able to modulate the previous memory content through the learned reset gates, helping to model the long-span dependence more efficiently for speech sequence and LSTMP shows comparable performance with GRU.

An Input Residual Connection for Simplifying Gated Recurrent Neural Networks

TLDR
The IRC is applicable, but not limited, to the GRNN designs of GRUs and LSTMs but also to FastGRNNs, Simple Recurrent Units (SRUs), and Strongly-Typed Recurrent Neural Networks (T-RNNs).

LSTM, GRU, Highway and a Bit of Attention: An Empirical Overview for Language Modeling in Speech Recognition

TLDR
It is found that the highway connections enable both standalone feedforward and recurrent neural language models to benefit better from the deep structure and provide a slight improvement of recognition accuracy after interpolation with count models.

Developing a novel recurrent neural network architecture with fewer parameters and good learning performance

TLDR
This study attempted to make further improvements in core structure and develop a novel, compact architecture with a high learning speed, which is expected to be useful for addressing problems such as predictions and analyses on contextual data and also suggests that there is room for the development of better architectures.

Recurrent Memory Network for Language Modeling

TLDR
Recurrent Memory Network (RMN) is proposed, a novel RNN architecture that not only amplifies the power of RNN but also facilitates the understanding of its internal functioning and allows us to discover underlying patterns in data.

Recurrent Memory Networks for Language Modeling

TLDR
Recurrent Memory Network (RMN) is proposed, a novel RNN architecture that not only amplifies the power of RNN but also facilitates the understanding of its internal functioning and allows us to discover underlying patterns in data.
...

References

SHOWING 1-10 OF 28 REFERENCES

Learning to Execute

TLDR
This work developed a new variant of curriculum learning that improved the networks' performance in all experimental conditions and had a dramatic impact on an addition problem, making an LSTM to add two 9-digit numbers with 99% accuracy.

Learning to Forget: Continual Prediction with LSTM

TLDR
This work identifies a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset, and proposes a novel, adaptive forget gate that enables an LSTm cell to learn to reset itself at appropriate times, thus releasing internal resources.

Learning Longer Memory in Recurrent Neural Networks

TLDR
This paper shows that learning longer term patterns in real data, such as in natural language, is perfectly possible using gradient descent, by using a slight structural modification of the simple recurrent neural network architecture.

LSTM: A Search Space Odyssey

TLDR
This paper presents the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling, and observes that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

TLDR
These advanced recurrent units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU), are found to be comparable to LSTM.

Sequence to Sequence Learning with Neural Networks

TLDR
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

Learning Recurrent Neural Networks with Hessian-Free Optimization

TLDR
This work solves the long-outstanding problem of how to effectively train recurrent neural networks on complex and difficult sequence modeling problems which may contain long-term data dependencies and offers a new interpretation of the generalized Gauss-Newton matrix of Schraudolph which is used within the HF approach of Martens.

On the importance of initialization and momentum in deep learning

TLDR
It is shown that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs to levels of performance that were previously achievable only with Hessian-Free optimization.

Recurrent Neural Network Regularization

TLDR
This paper shows how to correctly apply dropout to LSTMs, and shows that it substantially reduces overfitting on a variety of tasks.

Long Short-Term Memory

TLDR
A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.