LSTM recurrent networks learn simple context-free and context-sensitive languages
@article{Gers2001LSTMRN, title={LSTM recurrent networks learn simple context-free and context-sensitive languages}, author={Felix A. Gers and J{\"u}rgen Schmidhuber}, journal={IEEE transactions on neural networks}, year={2001}, volume={12 6}, pages={ 1333-40 } }
Previous work on learning regular languages from exemplary training sequences showed that long short-term memory (LSTM) outperforms traditional recurrent neural networks (RNNs). We demonstrate LSTMs superior performance on context-free language benchmarks for RNNs, and show that it works even better than previous hardwired or highly specialized architectures. To the best of our knowledge, LSTM variants are also the first RNNs to learn a simple context-sensitive language, namely a(n)b(n)c(n).
589 Citations
On learning context-free and context-sensitive languages
- Computer ScienceIEEE Trans. Neural Networks
- 2002
The long short-term memory (LSTM) is not the only neural network which learns a context sensitive language. Second-order sequential cascaded networks (SCNs) are able to induce means from a finite…
Understanding LSTM - a tutorial into Long Short-Term Memory Recurrent Neural Networks
- Computer ScienceArXiv
- 2019
This paper significantly improved documentation and fixed a number of errors and inconsistencies that accumulated in previous publications, focusing on the early, ground-breaking publications of LSTM-RNN.
Learning Context Sensitive Languages with LSTM Trained with Kalman Filters
- Computer ScienceICANN
- 2002
This novel combination of LSTM and the decoupled extended Kalman filter learns even faster and generalizes even better, requiring only the 10 shortest exemplars of the context sensitive language anbncn to deal correctly with values of up to 1000 and more.
On Evaluating the Generalization of LSTM Models in Formal Languages
- Computer Science
- 2018
This paper empirically evaluates the inductive learning capabilities of Long Short-Term Memory networks, a popular extension of simple RNNs, to learn simple formal languages, in particular ab, abc, and abcd.
Revisit Long Short-Term Memory: An Optimization Perspective
- Computer Science
- 2015
This work proposes a matrix-based batch learning method for LSTM with full Backpropagation Through Time (BPTT) and solves the state drifting issues as well as improving the overall performance for L STM using revised activation functions for gates.
Revisit Long Short-Term Memory : An Optimization Perspective
- Computer Science
- 2014
This work proposes a matrix-based batch learning method for LSTM with full Backpropagation Through Time (BPTT) and solves the state drifting issues as well as improving the overall performance for L STM using revised activation functions for gates.
Incremental training of first order recurrent neural networks to predict a context-sensitive language
- Computer ScienceNeural Networks
- 2003
Benchmarking of LSTM Networks
- Computer ScienceArXiv
- 2015
Significant findings include: LSTM performance depends smoothly on learning rates, batching and momentum has no significant effect on performance, softmax training outperforms least square training, and peephole units are not useful.
Spoken language understanding using long short-term memory neural networks
- Computer Science2014 IEEE Spoken Language Technology Workshop (SLT)
- 2014
This paper investigates using long short-term memory (LSTM) neural networks, which contain input, output and forgetting gates and are more advanced than simple RNN, for the word labeling task and proposes a regression model on top of the LSTM un-normalized scores to explicitly model output-label dependence.
A generalized LSTM-like training algorithm for second-order recurrent neural networks
- Computer ScienceNeural Networks
- 2012
References
SHOWING 1-10 OF 23 REFERENCES
Learning to Forget: Continual Prediction with LSTM
- Computer ScienceNeural Computation
- 2000
This work identifies a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset, and proposes a novel, adaptive forget gate that enables an LSTm cell to learn to reset itself at appropriate times, thus releasing internal resources.
Learning long-term dependencies with gradient descent is difficult
- Computer ScienceIEEE Trans. Neural Networks
- 1994
This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods.
Long Short-Term Memory
- Computer ScienceNeural Computation
- 1997
A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Recurrent Neural Networks Can Learn to Implement Symbol-Sensitive Counting
- Computer ScienceNIPS
- 1997
This work shows that a RNN can learn a harder CFL, a simple palindrome, by organizing its resources into a symbol-sensitive counting solution, and provides a dynamical systems analysis which demonstrates how the network can not only count, but also copy and store counting information.
Recurrent nets that time and count
- Computer ScienceProceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium
- 2000
Surprisingly, LSTM augmented by "peephole connections" from its internal cells to its multiplicative gates can learn the fine distinction between sequences of spikes separated by either 50 or 49 discrete time steps, without the help of any short training exemplars.
A Recurrent Neural Network that Learns to Count
- Computer ScienceConnect. Sci.
- 1999
This research employs standard backpropagation training techniques for a recurrent neural network in the task of learning to predict the next character in a simple deterministic CFL (DCFL), and shows that an RNN can learn to recognize the structure of a simple DCFL.
Discrete recurrent neural networks for grammatical inference
- Computer ScienceIEEE Trans. Neural Networks
- 1994
A novel neural architecture for learning deterministic context-free grammars, or equivalently, deterministic pushdown automata is described, and a composite error function is described to handle the different situations encountered in learning.
Learning Complex, Extended Sequences Using the Principle of History Compression
- Computer ScienceNeural Computation
- 1992
A simple principle for reducing the descriptions of event sequences without loss of information is introduced and this insight leads to the construction of neural architectures that learn to divide and conquer by recursively decomposing sequences.
The Dynamics of Discrete-Time Computation, with Application to Recurrent Neural Networks and Finite State Machine Extraction
- Computer ScienceNeural Computation
- 1996
It is shown that an RNN performing a finite state computation must organize its state space to mimic the states in the minimal deterministic finite state machine that can perform that computation, and a precise description of the attractor structure of such systems is given.
Analysis of Dynamical Recognizers
- Computer ScienceNeural Computation
- 1997
This article presents an empirical method for testing whether the language induced by the network is regular, and provides a detailed "-machine analysis of trained networks for both regular and nonregular languages".