Long Short-Term Memory

@article{Hochreiter1997LongSM,
  title={Long Short-Term Memory},
  author={S. Hochreiter and J. Schmidhuber},
  journal={Neural Computation},
  year={1997},
  volume={9},
  pages={1735-1780}
}
Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. [...] Key Method Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space…Expand
On the importance of sluggish state memory for learning long term dependency
TLDR
It is demonstrated that an MRN, optimised with noise injection, is able to learn the long term dependency within a complex grammar induction task, significantly outperforming the SRN, NARX and ESN. Expand
Language Modeling through Long-Term Memory Network
TLDR
This paper introduces Long Term Memory network (LTM), which can tackle the exploding and vanishing gradient problems and handles long sequences without forgetting. Expand
Learning Sparse Hidden States in Long Short-Term Memory
TLDR
This work proposes to explicitly impose sparsity on the hidden states to adapt them to the required information and shows that sparsity reduces the computational complexity and improves the performance of LSTM networks. Expand
Learning long-term dependencies in segmented-memory recurrent neural networks with backpropagation of error
TLDR
A comparison on the information latching problem showed that eRTRL is better able to handle the latching of information over longer periods of time, even though eBPTT guaranteed a better generalisation when training was successful, and pre-training significantly improved the ability to learn long-term dependencies with eB PTT. Expand
Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks
TLDR
Backpropagation through the ODE solver allows each layer to adapt its internal time-step, enabling the network to learn task-relevant time-scales and exceed state-of-the-art performance among RNNs on permuted sequential MNIST. Expand
Learning long-term dependencies with recurrent neural networks
TLDR
It is shown that basic time-delay RNN unfolded in time and formulated as state space models are indeed capable of learning time lags of at least a 100 time steps and even possess a self-regularisation characteristic, which adapts the internal error backflow, and analyse their optimal weight initialisation. Expand
Learning Longer Memory in Recurrent Neural Networks
TLDR
This paper shows that learning longer term patterns in real data, such as in natural language, is perfectly possible using gradient descent, by using a slight structural modification of the simple recurrent neural network architecture. Expand
On Extended Long Short-term Memory and Dependent Bidirectional Recurrent Neural Network
TLDR
This work first analyzes the memory behavior in three recurrent neural networks cells, then introduces trainable scaling factors that act like an attention mechanism to adjust memory decay adaptively and proposes a dependent bidirectional recurrent neural network (DBRNN). Expand
Low-pass Recurrent Neural Networks - A memory architecture for longer-term correlation discovery
TLDR
A simple, effective memory strategy is proposed that can extend the window over which BPTT can learn without requiring longer traces and is explored empirically on a few tasks and discusses its implications. Expand
Internal Memory Gate for Recurrent Neural Networks with Application to Spoken Language Understanding
TLDR
The effectiveness and the robustness of the proposed IMG-RNN is evaluated during a classification task of a small corpus of spoken dialogues from the DECODA project that allows us to evaluate the capability of each RNN to code short-term dependencies. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 69 REFERENCES
Learning long-term dependencies in NARX recurrent neural networks
TLDR
It is shown that the long-term dependencies problem is lessened for a class of architectures called nonlinear autoregressive models with exogenous (NARX) recurrent neural networks, which have powerful representational capabilities. Expand
Learning Unambiguous Reduced Sequence Descriptions
TLDR
Experiments show that systems based on these principles can require less computation per time step and many fewer training sequences than conventional training algorithms for recurrent nets. Expand
Bridging Long Time Lags by Weight Guessing and \long Short Term Memory"
Numerous recent papers (including many NIPS papers) focus on standard recurrent nets' inability to deal with long time lags between relevant input signals and teacher signals. Rather sophisticated,Expand
Induction of Multiscale Temporal Structure
TLDR
Simulation experiments indicate that slower time-scale hidden units are able to pick up global structure, structure that simply can not be learned by standard back propagation, using hidden units that operate with different time constants. Expand
Learning Sequential Structure with the Real-Time Recurrent Learning Algorithm
TLDR
A more powerful recurrent learning procedure, called real-time recurrent learning2,6 (RTRL), is applied to some of the same problems studied by Servan-Schreiber, Cleeremans, and McClelland and revealed that the internal representations developed by RTRL networks revealed that they learn a rich set of internal states that represent more about the past than is required by the underlying grammar. Expand
A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks
TLDR
This paper proposes a parallel on-line learning algorithms which performs local computations only, yet still is designed to deal with hidden units and with units whose past activations are ‘hidden in time’. Expand
Continuous history compression
TLDR
A contininuous version of history compression is described in which elements are discarded in a graded fashion dependent on their predictability, embodied by their (Shannon) information. Expand
Learning long-term dependencies with gradient descent is difficult
TLDR
This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods. Expand
Learning Complex, Extended Sequences Using the Principle of History Compression
TLDR
A simple principle for reducing the descriptions of event sequences without loss of information is introduced and this insight leads to the construction of neural architectures that learn to divide and conquer by recursively decomposing sequences. Expand
Generalization of backpropagation with application to a recurrent gas market model
TLDR
This paper will derive a generalization of backpropagation to recurrent systems (which input their own output), such as hybrids of perceptron-style networks and Grossberg/Hopfield networks, and does not require the storage of intermediate iterations to deal with continuous recurrence. Expand
...
1
2
3
4
5
...