# Learning to Forget: Continual Prediction with LSTM

@article{Gers2000LearningTF, title={Learning to Forget: Continual Prediction with LSTM}, author={F. Gers and J. Schmidhuber and Fred A. Cummins}, journal={Neural Computation}, year={2000}, volume={12}, pages={2451-2471} }

Long short-term memory (LSTM; Hochreiter & Schmidhuber, 1997) can solve numerous tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset. Without resets, the state may grow indefinitely and eventually cause the network to break down. Our remedy is a novel… CONTINUE READING

#### Supplemental Presentations

#### Topics from this paper.

#### Paper Mentions

1,548 Citations

A generalized LSTM-like training algorithm for second-order recurrent neural networks

- Computer Science, Medicine
- 2012

52

#### References

##### Publications referenced by this paper.

SHOWING 1-10 OF 57 REFERENCES

LSTM recurrent networks learn simple context-free and context-sensitive languages

- Computer Science, Medicine
- 2001

408

Encoding sequential structure: experience with the real-time recurrent learning algorithm

- Computer Science
- 1989

40

Learning long-term dependencies with gradient descent is difficult

- Computer Science, Medicine
- 1994

4374

Gradient calculations for dynamic recurrent neural networks: a survey

- Computer Science, Medicine
- 1995

533