Time-Derivative Models of Pavlovian Reinforcement


This chapter presents a model of classical conditioning called the temporal-difference (TD) model. The TD model was originally developed as a neuron-like unit for use in adaptive networks (Sutton and Barto 1987; Sutton 1984; Barto, Sutton and Anderson 1983). In this paper, however, we analyze it from the point of view of animal learning theory. Our intended audience is both animal learning researchers interested in computational theories of behavior and machine learning researchers interested in how their learning algorithms relate to, and may be constrained by, animal learning studies. For an exposition of the TD model from an engineering point of view, see Chapter 13 of this volume. We focus on what we see as the primary theoretical contribution to animal learning theory of the TD and related models: the hypothesis that reinforcement in classical conditioning is the time derivative of a composite association combining innate (US) and acquired (CS) associations. We call models based on some variant of this hypothesis time-derivative models , examples of which are the models we examine several of these models in relation to the TD model. We also briefly explore relationships with animal learning theories of reinforcement, including Mowrer's drive-induction theory (Mowrer 1960) and the Rescorla-Wagner model (Rescorla and Wagner 1972). Although the Rescorla-Wagner model is not a time-derivative model, it plays a central role in our exposition because it is well-known and successful both as an animal learning model and as an adaptive-network learning

Extracted Key Phrases

24 Figures and Tables

Citations per Year

1,823 Citations

Semantic Scholar estimates that this publication has 1,823 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Sutton1990TimeDerivativeMO, title={Time-Derivative Models of Pavlovian Reinforcement}, author={Richard S. Sutton and Andrew G. Barto}, year={1990} }