Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework

@article{Gershman2017ReinforcementLA,
  title={Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework},
  author={Samuel J. Gershman and Nathaniel D. Daw},
  journal={Annual Review of Psychology},
  year={2017},
  volume={68},
  pages={101–128}
}
  • S. Gershman, N. Daw
  • Published 4 January 2017
  • Psychology, Computer Science, Biology
  • Annual Review of Psychology
&NA; We review the psychology and neuroscience of reinforcement learning (RL), which has experienced significant progress in the past two decades, enabled by the comprehensive experimental study of simple learning and decision‐making tasks. However, one challenge in the study of RL is computational: The simplicity of these tasks ignores important aspects of reinforcement learning in the real world: (a) State spaces are high‐dimensional, continuous, and partially observable; this implies that (b… 

Figures from this paper

Reinforcement Learning and Attractor Neural Network Models of Associative Learning

TLDR
This work argues that challenges in reinforcement learning can be met by infusing the RL framework as an algorithmic theory of human behavior with the strengths of the attractor framework at the level of neural implementation and is supported by the hypothesis that ‘attractor states’ which are stable patterns of self-sustained and reverberating brain activity, are a manifestation of the collective dynamics of neuronal populations in the brain.

Meta-Reinforcement Learning with Episodic Recall: An Integrative Theory of Reward-Driven Learning

TLDR
Episodic memory reinstates activations in the prefrontal network based on contextual similarity, after passing them through a learned gating mechanism in this theory of episodic meta-RL (EMRL), which is extended to provide an account for episodic learning, incremental learning, and the coordination between them.

Ventral Striatum Lesions Do Not Affect Reinforcement Learning With Deterministic Outcomes on Slow Time Scales

TLDR
This study examined learning of 60 pairs of objects, in which the animals received only 1 trial per day with each pair, and found that monkeys with VS lesions were unimpaired relative to controls, which suggests that animals withVS lesions can still learn to select rewarded objects even when they cannot make use of working memory.

Episodic Memory for Learning Subjective-Timescale Models

TLDR
This work devise a novel approach to learning a transition dynamics model, based on the sequences of episodic memories that define the agent's subjective timescale - over which it learns world dynamics and over which future planning is performed.

The functional form of value normalization in human reinforcement learning

Reinforcement learning research in humans and other species indicates that rewards are represented in a context-dependent manner. More specifically, reward representations seem to be normalized as a

Time cell encoding in deep reinforcement learning agents depends on mnemonic demands

TLDR
Deep reinforcement learning agents are trained on a simulated trial-unique nonmatch-to-location (TUNL) task, and the activities of artificial recurrent units are analyzed using neuroscience-based methods to reconcile current discrepancies regarding the involvement of time cells in memory-encoding by providing a normative framework.

Forgetting Enhances Episodic Control With Structured Memories

TLDR
It is shown that a reinforcement learning agent that uses an episodic memory cache to find rewards in maze environments can forget a large percentage of older memories without any performance impairments, if they utilize mnemonic representations that contain structural information about space.

The Effect of State Representations in Sequential Sensory Prediction: Introducing the Shape Sequence Task

TLDR
This work introduces a novel sequence prediction task with hidden structure where participants have to combine learning and memory to find the proper state representation, without the task explicitly indicating such structure, and argues this task allows to investigate previously proposed models of state and task representations.

The Successor Representation: Its Computational Logic and Neural Substrates

  • S. Gershman
  • Computer Science, Psychology
    The Journal of Neuroscience
  • 2018
TLDR
Progress on the successor representation, which encodes states of the environment in terms of their predictive relationships with other states, and a broader framework for understanding how the brain negotiates tradeoffs between efficiency and flexibility for reinforcement learning are reviewed.

Hippocampal Contribution to Probabilistic Feedback Learning: Modeling Observation- and Reinforcement-based Processes

TLDR
Results suggested that OL processes may indeed take place concomitantly to reinforcement learning and involve activation of the hippocampus and central orbitofrontal cortex, and enhanced negative RL prediction error signaling was found in the anterior insula with greater use of OL over RL processes.
...

References

SHOWING 1-10 OF 160 REFERENCES

How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis

TLDR
This study proposes a new computational model that accounts for the dynamic integration of RL and WM processes observed in subjects’ behavior, and specifies distinct influences of the high‐level and low‐level cognitive functions on instrumental learning, beyond the possibilities offered by simple RL models.

Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms

TLDR
The results suggest that a bilateral attentional control network comprising the intraparietal sulcus, precuneus, and dorsolateral prefrontal cortex is involved in selecting what dimensions are relevant to the task at hand, effectively updating the task representation through trial and error.

Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement

TLDR
A normative model is presented that learns, by online temporal difference methods, to use working memory to maximize discounted future reward in partially observable settings and successfully solves a benchmark working memory problem.

Reinforcement learning in the brain

Episodic Memory Encoding Interferes with Reward Learning and Decreases Striatal Prediction Errors

TLDR
It is found that better episodic memory was related to a decreased influence of recent reward experience on choice, both within and across participants, and fMRI analyses revealed that during learning the canonical striatal reward prediction error signal was significantly weaker when episodi memory was stronger.

Human-level control through deep reinforcement learning

TLDR
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

Context, learning, and extinction.

TLDR
It is shown that online Bayesian inference within a model that assumes an unbounded number of latent causes can characterize a diverse set of behavioral results from such manipulations, some of which pose problems for the model of Redish et al. (2007).

The Curse of Planning

TLDR
It is demonstrated that having human decision makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement-learning strategy, and competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources.
...