Policy evaluation with temporal differences: a survey and comparison

  author={Christoph Dann and Gerhard Neumann and Jan Peters},
  journal={Journal of Machine Learning Research},
Policy evaluation is an essential step in most reinforcement learning approaches. It yields a value function, the quality assessment of states for a given policy, which can be used in a policy improvement step. Since the late 1980s, this research area has been dominated by temporal-difference (TD) methods due to their data-efficiency. However, core issues such as stability guarantees in the off-policy scenario, improved sample efficiency and probabilistic treatment of the uncertainty in the… CONTINUE READING



