# Learning to Predict by the Methods of Temporal Differences

@article{Sutton2005LearningTP,
title={Learning to Predict by the Methods of Temporal Differences},
author={Richard S. Sutton},
journal={Machine Learning},
year={2005},
volume={3},
pages={9-44}
}
• R. Sutton
• Published 1 August 1988
• Psychology
• Machine Learning
This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional prediction-learning methods assign credit by means of the difference between predicted and actual outcomes, the new methods assign credit by means of the difference between temporally successive predictions. Although such temporal-difference methods have been used in Samuel's…
3,564 Citations
Temporal-Difference Networks
• Computer Science
NIPS
• 2004
It is argued that TD networks represent a substantial extension of the abilities of TD methods and bring us closer to the goal of representing world knowledge in entirely predictive, grounded terms.
On the Worst-Case Analysis of Temporal-Difference Learning Algorithms
• Computer Science
Machine Learning
• 2004
Lower bounds on the performance of any algorithm for this learning problem are proved, and a similar analysis of the closely related problem of learning to predict in a model in which the learner must produce predictions for a whole batch of observations before receiving reinforcement is given.
The human as delta-rule learner.
• Psychology
• 2020
A long-standing debate in psychology concerns the best algorithmic description of learning. In delta-rule models, such as Rescorla-Wagner, beliefs are updated by a fixed proportion of errors in
Predicting Periodicity with Temporal Difference Learning
• Computer Science
ArXiv
• 2018
The results show that setting the discount rate to appropriately chosen complex numbers allows for online and incremental estimation of the Discrete Fourier Transform of a signal of interest with TD learning, and extends the types of knowledge representable by value functions, which are particularly useful for identifying periodic effects in the reward sequence.
Directly Estimating the Variance of the λ-Return Using Temporal-Difference Methods
• Computer Science
ArXiv
• 2018
A method for estimating the variance of the λ-return directly using policy evaluation methods from reinforcement learning is contributed, significantly simpler than prior methods that independently estimate the second moment of the â‚¬return.
Learning to Predict Independent of Span
• Computer Science
ArXiv
• 2015
This work considers how to learn multi-step predictions efficiently and shows that the exact same predictions can be learned in a much more computationally congenial way, with uniform per-step computation that does not depend on the span of the predictions.
A Study of Temporal Citation Count Prediction using Reinforcement Learning
• Computer Science
• 2013
A model-free method and a model-based method are proposed for predicting citation counts in both long and short terms and it is suggested that, unlike previous citation count prediction results, temporal prediction of citation count in a longer time span is less accurate.
A teacher-student framework to distill future trajectories
• Computer Science
ICLR
• 2021
Instead of hand-designing how trajectories should be incorporated, a teacher network learns to extract relevant information from the trajectories and to distill it into target activations which guide a student model that can only observe the present.
Directly Estimating the Variance of the {\lambda}-Return Using Temporal-Difference Methods
• Computer Science
• 2018
This paper investigates estimating the variance of a temporal-difference learning agent's update target using policy evaluation methods from reinforcement learning, contributing a method significantly simpler than prior methods that independently estimate the second moment of the {\lambda}-return.
On a Variance Reduction Correction of the Temporal Difference for Policy Evaluation in the Stochastic Continuous Setting
• Computer Science
ArXiv
• 2022
It is proved that standard learning algorithms based on the discretized temporal difference are doomed to fail when the time discretization tends to zero, and a variance-reduction correction of the temporal difference is proposed, leading to new learning algorithms that are stable with respect to vanishing time steps.

## References

SHOWING 1-10 OF 57 REFERENCES
Toward a modern theory of adaptive networks: expectation and prediction.
• Biology, Psychology
Psychological review
• 1981
The adaptive element presented learns to increase its response rate in anticipation of increased stimulation, producing a conditioned response before the occurrence of the unconditioned stimulus, and is in strong agreement with the behavioral data regarding the effects of stimulus context.
Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning)
A novel algorithm is examined that combines ASPECTS of REINFORCEMENT LEARNING and a DATA-DIRECTED SEARCH for USEFUL WEIGHTS, and is shown to out perform reinFORMCEMENT-LEARNING ALGORITHMS.
Machine learning: a guide to current research
• Computer Science
• 1986
The Judge: A Case-Based Reasoning System and some Approaches to Knowledge Acquisition are reviewed.
Learning by statistical cooperation of self-interested neuron-like computing elements.
• A. Barto
• Computer Science
Human neurobiology
• 1985
It is argued that some of the longstanding problems concerning adaptation and learning by networks might be solvable by this form of cooperativity, and computer simulation experiments are described that show how networks of self-interested components that are sufficiently robust can solve rather difficult learning problems.
Computers and Thought
• Art
• 1963
Computers and Thought showcases the work of the scientists who not only defined the field of Artificial Intelligence, but who are responsible for having developed it into what it is today. Originally