The Successor Representation: Its Computational Logic and Neural Substrates

@article{Gershman2018TheSR,
  title={The Successor Representation: Its Computational Logic and Neural Substrates},
  author={Samuel J. Gershman},
  journal={The Journal of Neuroscience},
  year={2018},
  volume={38},
  pages={7193 - 7200}
}
  • S. Gershman
  • Published 13 July 2018
  • Computer Science, Psychology
  • The Journal of Neuroscience
Reinforcement learning is the process by which an agent learns to predict long-term future reward. We now understand a great deal about the brain's reinforcement learning algorithms, but we know considerably less about the representations of states and actions over which these algorithms operate. A useful starting point is asking what kinds of representations we would want the brain to have, given the constraints on its computational architecture. Following this logic leads to the idea of the… 

Figures from this paper

Successor Feature Neural Episodic Control
TLDR
A combination of both episodic control and successor features in a single reinforcement learning framework is outlined and empirically illustrate its benefits.
Predicting the Future with Multi-scale Successor Representations
TLDR
An ensemble of SRs with multiple scales is proposed and it is shown that the derivative of multi-scale SR can reconstruct both the sequence of expected future states and estimate distance to goal, and can be computed linearly.
A First-Occupancy Representation for Reinforcement Learning
TLDR
The first-occupancy representation (FR) is introduced, which measures the expected temporal discount to the first time a state is accessed and facilitates the selection of efficient paths to desired states, allows the agent, under certain conditions, to plan provably optimal trajectories defined by a sequence of subgoals, and induces similar behavior to animals avoiding threatening stimuli.
Believing in dopamine
TLDR
Dopamine signals are implicated in not only reporting reward prediction errors but also various probabilistic computations, and it is proposed that these different roles for dopamine can be placed within a common reinforcement learning framework.
A neurally plausible model learns successor representations in partially observable environments
TLDR
It is shown that distributional successor features can support reinforcement learning in noisy environments in which direct learning of successful policies is infeasible and which allows for efficient value function computation in partially observed environments via the successor representation.
Biological Reinforcement Learning via Predictive Spacetime Encoding
TLDR
This study presents a novel RL model, called spacetime Q-Network (STQN), that exploits predictive spatiotemporal encoding to reliably learn highly uncertain environment and significantly outperforms a few state-of-the-art RL models.
Linear reinforcement learning in planning, grid fields, and cognitive control
TLDR
A model for decision making in the brain that reuses a temporally abstracted map of future events to enable biologically-realistic, flexible choice at the expense of specific, quantifiable biases is introduced.
Computational Neural Mechanisms of Goal-Directed Planning and Problem Solving
TLDR
This work presents a purely localist neural network model that can autonomously learn the structure of an environment and then achieve any arbitrary goal state in a changing environment without relearning reward values, and elucidates how neural inhibitory mechanisms can support competition between goal representations.
Successor Feature Sets: Generalizing Successor Representations Across Policies
TLDR
This paper develops a new, general successor-style representation, together with a Bellman equation that connects multiple sources of information within this representation, including different latent states, policies, and reward functions, and conducts experiments to explore which of the potential barriers to scaling are most pressing.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 67 REFERENCES
The successor representation in human reinforcement learning
TLDR
The results suggest that the successor representation is a computational substrate for semi-flexible choice in humans, introducing a subtler, more cognitive notion of habit.
The Successor Representation and Temporal Context
TLDR
It is shown that a variant of the temporal context model (TCM) can be understood as directly estimating the successor representation using the temporal difference learning algorithm (Sutton & Barto, 1998), which leads to a generalization of TCM and new experimental predictions.
Predictive representations can link model-based reinforcement learning to model-free mechanisms
TLDR
This work lays out a family of approaches by which model-based computation may be built upon a core of TD learning, and suggests that this framework represents a neurally plausible family of mechanisms for model- based evaluation.
Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework
TLDR
It is suggested that the ubiquitous and diverse roles of memory in RL may function as part of an integrated learning system that can efficiently approximate value functions over complex state spaces, learn with very little data, and bridge long‐term dependencies between actions and rewards.
Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System
TLDR
An improved fit mostly derives from the absence of large negative errors in the new model, suggesting that dopamine alone can encode the full range of TD errors in these situations, including those when rewards are omitted or received early.
Belief state representation in the dopamine system
TLDR
Dopamine neurons encode reward prediction errors (RPEs) that report the mismatch between expected reward and outcome for a given state, and when there is uncertainty about the current state, RPEs are calculated on the probabilistic representation of thecurrent state or belief state.
Representation and Timing in Theories of the Dopamine System
TLDR
The new theory assumes (in accord with recent computational theories of cortex) that problems of partial observability and stimulus history are solved in sensory cortex using statistical modeling and inference and that the TD system predicts reward using the results of this inference rather than raw sensory data.
Human-level control through deep reinforcement learning
TLDR
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Dopamine, Inference, and Uncertainty
TLDR
It is postulated that orbitofrontal cortex transforms the stimulus representation through recurrent dynamics, such that a simple error-driven learning rule operating on the transformed representation can implement the Bayesian reinforcement learning update.
...
1
2
3
4
5
...