Long Short-Term Memory

@article{Hochreiter1997LongSM,
  title={Long Short-Term Memory},
  author={S. Hochreiter and J. Schmidhuber},
  journal={Neural Computation},
  year={1997},
  volume={9},
  pages={1735-1780}
}
  • S. Hochreiter, J. Schmidhuber
  • Published 1997
  • Computer Science, Medicine
  • Neural Computation
  • Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. [...] Key Method Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space…Expand Abstract
    Learning Long-Term Dependencies in Irregularly-Sampled Time Series
    1
    On the importance of sluggish state memory for learning long term dependency
    5
    Language Modeling through Long-Term Memory Network
    5
    Learning Sparse Hidden States in Long Short-Term Memory
    Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks
    12
    Learning Longer Memory in Recurrent Neural Networks
    178
    On Extended Long Short-term Memory and Dependent Bidirectional Recurrent Neural Network
    9

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 65 REFERENCES
    Learning long-term dependencies in NARX recurrent neural networks
    498
    Learning Unambiguous Reduced Sequence Descriptions
    41
    Induction of Multiscale Temporal Structure
    139
    Learning Sequential Structure with the Real-Time Recurrent Learning Algorithm
    24
    A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks
    89
    Continuous history compression
    15
    Learning long-term dependencies with gradient descent is difficult
    4374
    Learning Complex, Extended Sequences Using the Principle of History Compression
    351
    Generalization of backpropagation with application to a recurrent gas market model
    643