Policy evaluation with temporal differences: a survey and comparison

  title={Policy evaluation with temporal differences: a survey and comparison},
  author={Christoph Dann and Gerhard Neumann and Jan Peters},
  journal={Journal of Machine Learning Research},
Policy evaluation is an essential step in most reinforcement learning approaches. It yields a value function, the quality assessment of states for a given policy, which can be used in a policy improvement step. Since the late 1980s, this research area has been dominated by temporal-difference (TD) methods due to their data-efficiency. However, core issues such as stability guarantees in the off-policy scenario, improved sample efficiency and probabilistic treatment of the uncertainty in the… CONTINUE READING



Citations per Year

89 Citations

Semantic Scholar estimates that this publication has 89 citations based on the available data.

See our FAQ for additional information.