Q($λ$) with Off-Policy Corrections

  title={Q(\$λ\$) with Off-Policy Corrections},
  author={Anna Harutyunyan and Marc G. Bellemare and Tom Stepleton and R{\'e}mi Munos},
We propose and analyze an alternate approach to off-policy multi-step temporal difference learning, in which off-policy returns are corrected with the current Q-function in terms of rewards, rather than with the target policy in terms of transition probabilities. We prove that such approximate corrections are sufficient for off-policy convergence both in policy evaluation and control, provided certain conditions. These conditions relate the distance between the target and behavior policies, the… CONTINUE READING
Highly Cited
This paper has 21 citations. REVIEW CITATIONS
Recent Discussions
This paper has been referenced on Twitter 14 times over the past 90 days. VIEW TWEETS

From This Paper

Figures, tables, and topics from this paper.
16 Citations
20 References
Similar Papers


Publications referenced by this paper.

Similar Papers

Loading similar papers…