# Long Short-Term Memory

@article{Hochreiter1997LongSM, title={Long Short-Term Memory}, author={Sepp Hochreiter and J{\"u}rgen Schmidhuber}, journal={Neural Computation}, year={1997}, volume={9}, pages={1735-1780} }

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. [... ] Key Method Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in spaceā¦ Expand

## 55,970 Citations

### Learning Long-Term Dependencies in Irregularly-Sampled Time Series

- Computer ScienceNeurIPS
- 2020

This work designs a new algorithm based on the long short-term memory (LSTM) that separates its memory from its time-continuous state within the RNN, allowing it to respond to inputs arriving at arbitrary time-lags while ensuring a constant error propagation through the memory path.

### On the importance of sluggish state memory for learning long term dependency

- Computer ScienceKnowl. Based Syst.
- 2016

### Language Modeling through Long-Term Memory Network

- Computer Science2019 International Joint Conference on Neural Networks (IJCNN)
- 2019

This paper introduces Long Term Memory network (LTM), which can tackle the exploding and vanishing gradient problems and handles long sequences without forgetting.

### Learning long-term dependencies in segmented-memory recurrent neural networks with backpropagation of error

- Computer ScienceNeurocomputing
- 2014

### Learning Sparse Hidden States in Long Short-Term Memory

- Computer ScienceICANN
- 2019

This work proposes to explicitly impose sparsity on the hidden states to adapt them to the required information and shows that sparsity reduces the computational complexity and improves the performance of LSTM networks.

### Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks

- Computer ScienceNeurIPS
- 2019

Backpropagation through the ODE solver allows each layer to adapt its internal time-step, enabling the network to learn task-relevant time-scales and exceed state-of-the-art performance among RNNs on permuted sequential MNIST.

### Learning Longer Memory in Recurrent Neural Networks

- Computer ScienceICLR
- 2015

This paper shows that learning longer term patterns in real data, such as in natural language, is perfectly possible using gradient descent, by using a slight structural modification of the simple recurrent neural network architecture.

### On Extended Long Short-term Memory and Dependent Bidirectional Recurrent Neural Network

- Computer ScienceNeurocomputing
- 2019

### Low-pass Recurrent Neural Networks - A memory architecture for longer-term correlation discovery

- Computer ScienceArXiv
- 2018

A simple, effective memory strategy is proposed that can extend the window over which BPTT can learn without requiring longer traces and is explored empirically on a few tasks and discusses its implications.

## References

SHOWING 1-10 OF 49 REFERENCES

### Learning long-term dependencies in NARX recurrent neural networks

- Computer ScienceIEEE Trans. Neural Networks
- 1996

It is shown that the long-term dependencies problem is lessened for a class of architectures called nonlinear autoregressive models with exogenous (NARX) recurrent neural networks, which have powerful representational capabilities.

### Learning Unambiguous Reduced Sequence Descriptions

- Computer ScienceNIPS
- 1991

Experiments show that systems based on these principles can require less computation per time step and many fewer training sequences than conventional training algorithms for recurrent nets.

### Bridging Long Time Lags by Weight Guessing and \long Short Term Memory"

- Computer Science
- 1996

Long short term memory (LSTM), their own recent algorithm, is used to solve hard problems that can neither be quickly solved by random weight guessing nor by any other recurrent net algorithm the authors are aware of.

### Induction of Multiscale Temporal Structure

- Computer ScienceNIPS
- 1991

Simulation experiments indicate that slower time-scale hidden units are able to pick up global structure, structure that simply can not be learned by standard back propagation, using hidden units that operate with different time constants.

### Learning Sequential Structure with the Real-Time Recurrent Learning Algorithm

- Computer ScienceInt. J. Neural Syst.
- 1989

A more powerful recurrent learning procedure, called real-time recurrent learning2,6 (RTRL), is applied to some of the same problems studied by Servan-Schreiber, Cleeremans, and McClelland and revealed that the internal representations developed by RTRL networks revealed that they learn a rich set of internal states that represent more about the past than is required by the underlying grammar.

### A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks

- Computer Science
- 1989

This paper proposes a parallel on-line learning algorithms which performs local computations only, yet still is designed to deal with hidden units and with units whose past activations are āhidden in timeā.

### Continuous history compression

- Computer Science
- 1993

A contininuous version of history compression is described in which elements are discarded in a graded fashion dependent on their predictability, embodied by their (Shannon) information.

### Learning long-term dependencies with gradient descent is difficult

- Computer ScienceIEEE Trans. Neural Networks
- 1994

This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods.

### Generalization of backpropagation with application to a recurrent gas market model

- MathematicsNeural Networks
- 1988

### Gradient calculations for dynamic recurrent neural networks: a survey

- Computer ScienceIEEE Trans. Neural Networks
- 1995

The author discusses advantages and disadvantages of temporally continuous neural networks in contrast to clocked ones and presents some "tricks of the trade" for training, using, and simulating continuous time and recurrent neural networks.