Off-Policy Temporal Difference Learning with Function Approximation


We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Offpolicy learning is of interest because it forms the basis for popular reinforcement learning methods such as Q-learning, which has been known to diverge with linear function approximation, and because it is critical to the… (More)

3 Figures and Tables


