Learning to Forget: Continual Prediction with LSTM

@article{Gers2000LearningTF,
  title={Learning to Forget: Continual Prediction with LSTM},
  author={F. Gers and J. Schmidhuber and Fred A. Cummins},
  journal={Neural Computation},
  year={2000},
  volume={12},
  pages={2451-2471}
}
  • F. Gers, J. Schmidhuber, Fred A. Cummins
  • Published 2000
  • Mathematics, Computer Science, Medicine
  • Neural Computation
  • Long short-term memory (LSTM; Hochreiter & Schmidhuber, 1997) can solve numerous tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset. Without resets, the state may grow indefinitely and eventually cause the network to break down. Our remedy is a novel… CONTINUE READING
    A generalized LSTM-like training algorithm for second-order recurrent neural networks
    52
    Learning Precise Timing with LSTM Recurrent Networks
    935
    Learning compact recurrent neural networks
    64
    Radically Simplifying Gated Recurrent Architectures Without Loss of Performance
    Gated Orthogonal Recurrent Units: On Learning to Forget
    55
    Training Recurrent Networks by Evolino
    224
    A review on the long short-term memory model

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 57 REFERENCES
    Learning to Forget: Continual Prediction with Lstm Learning to Forget: Continual Prediction with Lstm
    11
    Long Short-Term Memory
    29817
    Encoding sequential structure: experience with the real-time recurrent learning algorithm
    41
    Gradient calculations for dynamic recurrent neural networks: a survey
    549
    The Recurrent Cascade-Correlation Architecture
    169
    Finite State Automata and Simple Recurrent Networks
    410