Time-Derivative Models of Pavlovian Reinforcement


This chapter presents a model of classical conditioning called the temporal-difference (TD) model. The TD model was originally developed as a neuron-like unit for use in adaptive networks (Sutton and Barto 1987; Sutton 1984; Barto, Sutton and Anderson 1983). In this paper, however, we analyze it from the point of view of animal learning theory. Our intended audience is both animal learning researchers interested in computational theories of behavior and machine learning researchers interested in how their learning algorithms relate to, and may be constrained by, animal learning studies. For an exposition of the TD model from an engineering point of view, see Chapter 13 of this volume. We focus on what we see as the primary theoretical contribution to animal learning theory of the TD and related models: the hypothesis that reinforcement in classical conditioning is the time derivative of a composite association combining innate (US) and acquired (CS) associations. We call models based on some variant of this hypothesis time-derivative models , examples of which are the models we examine several of these models in relation to the TD model. We also briefly explore relationships with animal learning theories of reinforcement, including Mowrer's drive-induction theory (Mowrer 1960) and the Rescorla-Wagner model (Rescorla and Wagner 1972). Although the Rescorla-Wagner model is not a time-derivative model, it plays a central role in our exposition because it is well-known and successful both as an animal learning model and as an adaptive-network learning

Extracted Key Phrases

24 Figures and Tables

Showing 1-10 of 53 references

A temporal-difference model of classical conditioning

  • R S Sutton, A G Barto
  • 1987
Highly Influential
3 Excerpts

An adaptive network that constructs and uses an internal model of its environment

  • R S Sutton, A G Barto
  • 1981
Highly Influential
9 Excerpts

Integrating behavioral and biological models of classical conditioning

  • N H Donegan, M A Gluck, R F Thompson
  • 1989

Simulation of a classically conditioned response: Components of the input trace and a cerebellar neural network implementation of the Sutton-Barto-Desmond model

  • D E J Blazis, J W Moore
  • 1987
3 Excerpts
Showing 1-10 of 254 extracted citations
Citations per Year

2,001 Citations

Semantic Scholar estimates that this publication has received between 1,446 and 2,692 citations based on the available data.

See our FAQ for additional information.