Temporal Difference Learning

Abstract

This article introduces a class of incremental learning procedures spe cialized for prediction that is for using past experience with an incompletely known system to predict its future behavior Whereas conventional prediction learning meth ods assign credit by means of the di erence between predicted and actual outcomes the new methods assign credit by means of the di erence between temporally successive predictions Although such temporal di erence methods have been used in Samuel s checker player Holland s bucket brigade and the author s Adaptive Heuristic Critic they have remained poorly understood Here we prove their convergence and optimality for special cases and relate them to supervised learning methods For most real world prediction problems temporal di erence methods require less memory and less peak computation than conventional methods and they produce more accurate predictions We argue that most problems to which supervised learning is currently applied are really prediction problems of the sort to which temporal di erence methods can be applied to advantage

Extracted Key Phrases

Cite this paper

@inproceedings{Michalski1988TemporalDL, title={Temporal Difference Learning}, author={D. Michalski}, year={1988} }