• Corpus ID: 7498449

Learning to Forget: Continual Prediction with Lstm Learning to Forget: Continual Prediction with Lstm

@inproceedings{Gers1999LearningTF,
  title={Learning to Forget: Continual Prediction with Lstm Learning to Forget: Continual Prediction with Lstm},
  author={Felix Alexander Gers and urgen Schmidhuber and Fred Cummins},
  year={1999}
}
Long Short-Term Memory (LSTM,,5]) can solve many tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams without explicitly marked sequence ends. Without resets, the internal state values may grow indeenitely and eventually cause the network to break down. Our remedy is an adaptive \forget gate" that enables an LSTM cell to learn to reset itself at appropriate times, thus releasing… 

Learning to Forget: Continual Prediction with LSTM

This work identifies a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset, and proposes a novel, adaptive forget gate that enables an LSTm cell to learn to reset itself at appropriate times, thus releasing internal resources.

The comparison of autoencoder architectures in improving of prediction models

  • A. Prosvetov
  • Computer Science
    Journal of Physics: Conference Series
  • 2018
It is found, that the usage of appropriate event encoding allows to improve the quality of CNN based networks without using the modification of architectures.

Multilayer LSTM with Global Access Gate for Predicting Students Performance Using Online Behaviors

A Monte-Carlo-based feature selection algorithm to select the best feature set for representing student behaviors based on long short-term memory that incorporates global features and considers the temporal behavior of students is proposed.

Self-attention based bidirectional long short-term memory-convolutional neural network classifier for the prediction of ischemic and non-ischemic cardiomyopathy

A new unified architecture comprising convolutional neural network (inception-V3 model) and bidirectional long short-term memory (BiLSTM) with self-attention mechanism to predict the ischemic or non-ischemic to classify cardiomyopathy using histopathological images is proposed.

Modelling Speaker-dependent Auditory Attention Using A Spiking Neural Network with Temporal Coding and Supervised Learning

This paper studies ring-type digital spiking neural networks that can exhibit multi-phase synchronization phenomena of various periodic spike-trains and investigates relationship between approximation error and the network size.

Short-term speed prediction of urban roads based on multi-source feature fusion

  • Silin LiuZhuhua LiaoYizhi LiuAiping Yi
  • Computer Science
    2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)
  • 2021
The results show that MF-BiLSTM has the best prediction performance in terms of prediction error measurement, which is higher than that of ARIMA, LSTM, CNN, and BiL STM, respectively.

EnergyNet: Energy-Efficient Dynamic Inference

A CNN for energy-aware dynamic routing, called EnergyNet, is proposed that achieves adaptive-complexity inference based on the inputs, leading to an overall reduction of run time energy cost while actually improving accuracy.

Prediction of Sea Clutter Based on Recurrent Neural Network

The model used in this paper has a smaller prediction error than RBF, the prediction performance is better, the model can achieve high-precision and high-efficiency prediction of sea clutter and the effectiveness of the method is verified.

Integrating genotype and weather variables for soybean yield prediction using deep learning

A machine learning framework in soybean is presented to analyze historical performance records from Uniform Soybean Tests in North America with an aim to dissect and predict genotype response in multiple envrionments leveraging pedigree and genomic relatedness measures along with weekly weather parameters.

References

SHOWING 1-10 OF 10 REFERENCES

Learning to Forget: Continual Prediction with LSTM

This work identifies a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset, and proposes a novel, adaptive forget gate that enables an LSTm cell to learn to reset itself at appropriate times, thus releasing internal resources.

Long Short-Term Memory

A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

Learning long-term dependencies with gradient descent is difficult

This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods.

Learning Sequential Structure with the Real-Time Recurrent Learning Algorithm

A more powerful recurrent learning procedure, called real-time recurrent learning2,6 (RTRL), is applied to some of the same problems studied by Servan-Schreiber, Cleeremans, and McClelland and revealed that the internal representations developed by RTRL networks revealed that they learn a rich set of internal states that represent more about the past than is required by the underlying grammar.

Finite State Automata and Simple Recurrent Networks

A network architecture introduced by Elman (1988) for predicting successive elements of a sequence and shows that long distance sequential contingencies can be encoded by the network even if only subtle statistical properties of embedded strings depend on the early information.

The recurrent cascadecorrelation learning algorithm

  • The recurrent cascadecorrelation learning algorithm
  • 1991

The recurrent cascade- correlation learning algorithm

  • edi- tors,
  • 1991

Long shortterm memory

  • Neural Computation
  • 1997

The utility driven dynamic error propagation network

  • The utility driven dynamic error propagation network
  • 1987