LSTM: A Search Space Odyssey

@article{Greff2017LSTMAS,
  title={LSTM: A Search Space Odyssey},
  author={Klaus Greff and Rupesh Kumar Srivastava and Jan Koutn{\'i}k and Bas R. Steunebrink and J{\"u}rgen Schmidhuber},
  journal={IEEE Transactions on Neural Networks and Learning Systems},
  year={2017},
  volume={28},
  pages={2222-2232}
}
Several variants of the long short-term memory (LSTM) architecture for recurrent neural networks have been proposed since its inception in 1995. In recent years, these networks have become the state-of-the-art models for a variety of machine learning problems. This has led to a renewed interest in understanding the role and utility of various computational components of typical LSTM variants. In this paper, we present the first large-scale analysis of eight LSTM variants on three representative… 

Figures from this paper

Discovering Gated Recurrent Neural Network Architectures
TLDR
This chapter proposes a new method, evolution of a tree-based encoding of the gated memory nodes, and shows that it makes it possible to explore new variations more effectively than other methods, and discovers nodes with multiple recurrent paths and multiple memory cells, which lead to significant improvement in the standard language modeling benchmark task.
Performance of Three Slim Variants of The Long Short-Term Memory (LSTM) Layer
  • Daniel Kent, F. Salem
  • Computer Science
    2019 IEEE 62nd International Midwest Symposium on Circuits and Systems (MWSCAS)
  • 2019
TLDR
Computational analysis of the validation accuracy of a convolutional plus recurrent neural network architecture designed to analyze sentiment, using comparatively the standard LSTM and three Slim LSTm layers finds that some realizations of the Slim L STM layers can potentially perform as well as the standardLSTM layer for the considered architecture targeted at sentiment analysis.
Restricted Recurrent Neural Networks
TLDR
Experiments on natural language modeling show that compared with its classical counterpart, the restricted recurrent architecture generally produces comparable results at about 50% compression rate, and in particular, the Restricted LSTM can outperform classical RNN with even less number of parameters.
Learning compact recurrent neural networks
TLDR
This work study mechanisms for learning compact RNNs and LSTMs via low-rank factorizations and parameter sharing schemes, and finds a hybrid strategy of using structured matrices in the bottom layers and shared low- rank factors on the top layers to be particularly effective.
From Nodes to Networks: Evolving Recurrent Neural Networks
TLDR
This paper proposes a new method, evolution of a tree-based encoding of the gated memory nodes, and shows that it makes it possible to explore new variations more effectively than other methods, and discovers nodes with multiple recurrent paths and multiple memory cells, which lead to significant improvement in the standard language modeling benchmark task.
An Empirical Exploration of Recurrent Network Architectures
TLDR
It is found that adding a bias of 1 to the LSTM's forget gate closes the gap between the L STM and the recently-introduced Gated Recurrent Unit (GRU) on some but not all tasks.
MA-LSTM: A Multi-Attention Based LSTM for Complex Pattern Extraction
TLDR
This paper uses a Multiple Attention (MA) based network to generate the forget gate which refines the optimization space of gate function and improves the granularity of the recurrent neural network to approximate the map in the ground truth.
A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures
TLDR
The LSTM cell and its variants are reviewed and their variants are explored to explore the learning capacity of the LSTm cell and the L STM networks are divided into two broad categories:LSTM-dominated networks and integrated LSTS networks.
Investigating gated recurrent neural networks for acoustic modeling
TLDR
GRU usually performs better than LSTM, for possibly GRU is able to modulate the previous memory content through the learned reset gates, helping to model the long-span dependence more efficiently for speech sequence and LSTMP shows comparable performance with GRU.
Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks
TLDR
An open-source, end-to-end, LSTM RNN system running on limited computational resources (a single GPU) that outperforms a reference i-vector system on a subset of the NIST Language Recognition Evaluation (8 target languages, 3s task) by up to a 26%.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 64 REFERENCES
An Empirical Exploration of Recurrent Network Architectures
TLDR
It is found that adding a bias of 1 to the LSTM's forget gate closes the gap between the L STM and the recently-introduced Gated Recurrent Unit (GRU) on some but not all tasks.
Long short-term memory recurrent neural network architectures for large scale acoustic modeling
TLDR
The first distributed training of LSTM RNNs using asynchronous stochastic gradient descent optimization on a large cluster of machines is introduced and it is shown that a two-layer deep LSTm RNN where each L STM layer has a linear recurrent projection layer can exceed state-of-the-art speech recognition performance.
Training Recurrent Networks by Evolino
TLDR
It is shown that Evolino-based LSTM can solve tasks that Echo State nets cannot and achieves higher accuracy in certain continuous function generation tasks than conventional gradient descent RNNs, including gradient-basedLSTM.
Speech recognition with deep recurrent neural networks
TLDR
This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.
Learning to Forget: Continual Prediction with LSTM
TLDR
This work identifies a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset, and proposes a novel, adaptive forget gate that enables an LSTm cell to learn to reset itself at appropriate times, thus releasing internal resources.
Dropout Improves Recurrent Neural Networks for Handwriting Recognition
TLDR
It is shown that RNNs with Long Short-Term memory cells can be greatly improved using dropout - a recently proposed regularization method for deep architectures, even when the network mainly consists of recurrent and shared connections.
Multi-resolution linear prediction based features for audio onset detection with bidirectional LSTM neural networks
TLDR
A multi-resolution approach based on discrete wavelet transform and linear prediction filtering that improves time resolution and performance of onset detection in different musical scenarios and significantly outperforms existing methods in terms of F-Measure is presented.
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
TLDR
These advanced recurrent units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU), are found to be comparable to LSTM.
Dynamic Cortex Memory: Enhancing Recurrent Neural Networks for Gradient-Based Sequence Learning
TLDR
The presented dynamic cortex memory (DCM) is an extension of the well-known long short term memory (LSTM) model that is able to converge faster during training with back-propagation through time (BPTT) than LSTM under the same training conditions.
...
1
2
3
4
5
...