Graceful Forgetting II. Data as a Process

  title={Graceful Forgetting II. Data as a Process},
  author={Alain de Cheveign'e},
This is the second part of a two-part essay on memory and its inseparable nemesis, forgetting . It looks at memory from a computational perspective in terms of function and constraints, in the rational spirit of Marr (1982) or Anderson (1989). The core question is: How to fit an infinite past into finite storage? The requirements and benefits of such a “scalable” data store are analyzed and the consequences explored, the main of which is that conserving data should be seen as a process… 



Memory Networks

This work describes a new class of learning models called memory networks, which reason with inference components combined with a long-term memory component; they learn how to use these jointly.

Hybrid computing using a neural network with dynamic external memory

A machine learning model called a differentiable neural computer (DNC), which consists of a neural network that can read from and write to an external memory matrix, analogous to the random-access memory in a conventional computer.

Episodic Memory in Lifelong Language Learning

This work proposes an episodic memory model that performs sparse experience replay and local adaptation to mitigate catastrophic forgetting in a lifelong language learning setup where a model needs to learn from a stream of text examples without any dataset identifier.

End-To-End Memory Networks

A neural network with a recurrent attention model over a possibly large external memory that is trained end-to-end, and hence requires significantly less supervision during training, making it more generally applicable in realistic settings.

Long Short-Term Memory

A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

Predictability , Complexity , and Learning

It is argued that the divergent part of Ipred(T), the mutual information between the past and the future of a time series, provides the unique measure for the complexity of dynamics underlying aTime series.


It is a mistake to consider perception and learning separately because what one learns is strongly constrained by what one perceives, and what one perceives depends on what one bas experienced. I

Episodic Memory Reader: Learning What to Remember for Question Answering from Streaming Data

A novel end-to-end deep network model for reading comprehension called Episodic Memory Reader (EMR) that sequentially reads the input contexts into an external memory, while replacing memories that are less important for answering unseen questions is proposed.

Memory as Perception of the Past: Compressed Time inMind and Brain