Q-learning
- C. Watkins, P. Dayan
- Computer ScienceMachine-mediated learning
- 1 May 1992
This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.
A Neural Substrate of Prediction and Reward
- W. Schultz, P. Dayan, P. Montague
- Psychology, BiologyScience
- 14 March 1997
Findings in this work indicate that dopaminergic neurons in the primate whose fluctuating output apparently signals changes or errors in the predictions of future salient and rewarding events can be understood through quantitative theories of adaptive optimizing control.
Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems
This text introduces the basic mathematical and computational methods of theoretical neuroscience and presents applications in a variety of areas including vision, sensory-motor integration, development, learning, and memory.
Technical Note: Q-Learning
- C. Watkins, P. Dayan
- Computer ScienceMachine-mediated learning
- 1 May 1992
A convergence theorem is presented and proves that Q -learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.
Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control
This work considers dual-action choice systems from a normative perspective, and suggests a Bayesian principle of arbitration between them according to uncertainty, so each controller is deployed when it should be most accurate.
Cortical substrates for exploratory decisions in humans
- N. Daw, J. O’Doherty, P. Dayan, B. Seymour, R. Dolan
- BiologyNature
- 15 June 2006
It is shown, in a gambling task, that human subjects' choices can be characterized by a computationally well-regarded strategy for addressing the explore/exploit dilemma, and a model of action selection under uncertainty that involves switching between exploratory and exploitative behavioural modes is suggested.
Model-based influences on humans’ choices and striatal prediction errors
- N. Daw, S. Gershman, B. Seymour, P. Dayan, R. Dolan
- BiologyNeuron
- 15 March 2011
Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning
- J. O’Doherty, P. Dayan, J. Schultz, R. Deichmann, Karl J. Friston, R. Dolan
- Psychology, BiologyScience
- 16 April 2004
This work scanned human participants with functional magnetic resonance imaging while they engaged in instrumental conditioning to suggest partly dissociable contributions of the ventral and dorsal striatum to the critic and the actor.
Uncertainty, Neuromodulation, and Attention
- Angela J. Yu, P. Dayan
- PsychologyNeuron
- 19 May 2005
A framework for mesencephalic dopamine systems based on predictive Hebbian learning
- P. Montague, P. Dayan, T. Sejnowski
- Biology, PsychologyJournal of Neuroscience
- 1 March 1996
A theoretical framework is developed that shows how mesencephalic dopamine systems could distribute to their targets a signal that represents information about future expectations and shows that, through a simple influence on synaptic plasticity, fluctuations in dopamine release can act to change the predictions in an appropriate manner.
...
...