Learning to Predict by the Methods of Temporal Differences

  title={Learning to Predict by the Methods of Temporal Differences},
  author={Richard S. Sutton},
  journal={Machine Learning},
  • R. Sutton
  • Published 1 August 1988
  • Psychology
  • Machine Learning
This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional prediction-learning methods assign credit by means of the difference between predicted and actual outcomes, the new methods assign credit by means of the difference between temporally successive predictions. Although such temporal-difference methods have been used in Samuel's… 
Temporal-Difference Networks
It is argued that TD networks represent a substantial extension of the abilities of TD methods and bring us closer to the goal of representing world knowledge in entirely predictive, grounded terms.
On the Worst-Case Analysis of Temporal-Difference Learning Algorithms
Lower bounds on the performance of any algorithm for this learning problem are proved, and a similar analysis of the closely related problem of learning to predict in a model in which the learner must produce predictions for a whole batch of observations before receiving reinforcement is given.
The human as delta-rule learner.
A long-standing debate in psychology concerns the best algorithmic description of learning. In delta-rule models, such as Rescorla-Wagner, beliefs are updated by a fixed proportion of errors in
Predicting Periodicity with Temporal Difference Learning
The results show that setting the discount rate to appropriately chosen complex numbers allows for online and incremental estimation of the Discrete Fourier Transform of a signal of interest with TD learning, and extends the types of knowledge representable by value functions, which are particularly useful for identifying periodic effects in the reward sequence.
Directly Estimating the Variance of the λ-Return Using Temporal-Difference Methods
A method for estimating the variance of the λ-return directly using policy evaluation methods from reinforcement learning is contributed, significantly simpler than prior methods that independently estimate the second moment of the €return.
Learning to Predict Independent of Span
This work considers how to learn multi-step predictions efficiently and shows that the exact same predictions can be learned in a much more computationally congenial way, with uniform per-step computation that does not depend on the span of the predictions.
A Study of Temporal Citation Count Prediction using Reinforcement Learning
A model-free method and a model-based method are proposed for predicting citation counts in both long and short terms and it is suggested that, unlike previous citation count prediction results, temporal prediction of citation count in a longer time span is less accurate.
A teacher-student framework to distill future trajectories
Instead of hand-designing how trajectories should be incorporated, a teacher network learns to extract relevant information from the trajectories and to distill it into target activations which guide a student model that can only observe the present.
Directly Estimating the Variance of the {\lambda}-Return Using Temporal-Difference Methods
This paper investigates estimating the variance of a temporal-difference learning agent's update target using policy evaluation methods from reinforcement learning, contributing a method significantly simpler than prior methods that independently estimate the second moment of the {\lambda}-return.
On a Variance Reduction Correction of the Temporal Difference for Policy Evaluation in the Stochastic Continuous Setting
It is proved that standard learning algorithms based on the discretized temporal difference are doomed to fail when the time discretization tends to zero, and a variance-reduction correction of the temporal difference is proposed, leading to new learning algorithms that are stable with respect to vanishing time steps.


Toward a modern theory of adaptive networks: expectation and prediction.
The adaptive element presented learns to increase its response rate in anticipation of increased stimulation, producing a conditioned response before the occurrence of the unconditioned stimulus, and is in strong agreement with the behavioral data regarding the effects of stimulus context.
Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning)
A novel algorithm is examined that combines ASPECTS of REINFORCEMENT LEARNING and a DATA-DIRECTED SEARCH for USEFUL WEIGHTS, and is shown to out perform reinFORMCEMENT-LEARNING ALGORITHMS.
Machine learning: a guide to current research
The Judge: A Case-Based Reasoning System and some Approaches to Knowledge Acquisition are reviewed.
Learning by statistical cooperation of self-interested neuron-like computing elements.
  • A. Barto
  • Computer Science
    Human neurobiology
  • 1985
It is argued that some of the longstanding problems concerning adaptation and learning by networks might be solvable by this form of cooperativity, and computer simulation experiments are described that show how networks of self-interested components that are sufficiently robust can solve rather difficult learning problems.
Computers and Thought
Computers and Thought showcases the work of the scientists who not only defined the field of Artificial Intelligence, but who are responsible for having developed it into what it is today. Originally
Intelligent Behavior as an Adaptation to the Task Environment
This dissertation argues that examining more closely the way animate systems cope with real-world environments can provide valuable insights about the structural requirements for intelligent behavior.
Neuronlike adaptive elements that can solve difficult learning control problems
It is shown how a system consisting of two neuronlike adaptive elements can solve a difficult learning control problem and the relation of this work to classical and instrumental conditioning in animal learning studies and its possible implications for research in the neurosciences.
A neuronal model of classical conditioning
It is concluded that real-time learning mechanisms that do not require evaluative feedback from the environment are fundamental to natural intelligence and may have implications for artificial intelligence.
A neural model of adaptive behavior
A neurally plausible model of adaptive behavior is developed in the Boolean domain that is simple enough to be formally approached, but general enough to address a number of interesting issues.