Learning to Forget: Continual Prediction with LSTM

  title={Learning to Forget: Continual Prediction with LSTM},
  author={F. Gers and J. Schmidhuber and Fred A. Cummins},
  journal={Neural Computation},
  • F. Gers, J. Schmidhuber, Fred A. Cummins
  • Published 2000
  • Psychology, Computer Science, Medicine
  • Neural Computation
  • Long short-term memory (LSTM; Hochreiter & Schmidhuber, 1997) can solve numerous tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset. Without resets, the state may grow indefinitely and eventually cause the network to break down. Our remedy is a novel… CONTINUE READING
    2,026 Citations
    Learning Precise Timing with LSTM Recurrent Networks
    • 1,027
    • PDF
    A generalized LSTM-like training algorithm for second-order recurrent neural networks
    • 57
    • Highly Influenced
    • PDF
    Learning compact recurrent neural networks
    • 72
    • PDF
    A review on the long short-term memory model
    • 5
    Gated Orthogonal Recurrent Units: On Learning to Forget
    • 67
    • PDF
    Radically Simplifying Gated Recurrent Architectures Without Loss of Performance
    • J. Boardman, Y. Xie
    • Computer Science
    • 2019 IEEE International Conference on Big Data (Big Data)
    • 2019
    Training Recurrent Networks by Evolino
    • 228
    • PDF
    The Performance of LSTM and BiLSTM in Forecasting Time Series
    • 20


    Learning to Forget: Continual Prediction with Lstm Learning to Forget: Continual Prediction with Lstm
    • 12
    Long Short-Term Memory
    • 35,817
    • PDF
    LSTM recurrent networks learn simple context-free and context-sensitive languages
    • 483
    • PDF
    Encoding sequential structure: experience with the real-time recurrent learning algorithm
    • 44
    Learning long-term dependencies with gradient descent is difficult
    • 4,967
    • PDF
    Learning long-term dependencies in NARX recurrent neural networks
    • 545
    • PDF
    Gradient calculations for dynamic recurrent neural networks: a survey
    • 584
    • PDF
    The Recurrent Cascade-Correlation Architecture
    • 194
    • PDF
    Finite State Automata and Simple Recurrent Networks
    • 489
    • PDF