Off-Policy Temporal Difference Learning with Function Approximation


We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Offpolicy learning is of interest because it forms the basis for popular reinforcement learning methods such as Q-learning, which has been known to diverge with linear function approximation, and because it is critical to the… (More)


3 Figures and Tables

Cite this paper

@inproceedings{Precup2001OffPolicyTD, title={Off-Policy Temporal Difference Learning with Function Approximation}, author={Doina Precup and Richard S. Sutton and Sanjoy Dasgupta}, booktitle={ICML}, year={2001} }