Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates
@inproceedings{Penedones2019AdaptiveTL, title={Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates}, author={Hugo Penedones and C. Riquelme and D. Vincent and Hartmut Maennel and T. Mann and Andr{\'e} Barreto and S. Gelly and Gergely Neu}, booktitle={NeurIPS}, year={2019} }
We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation. The two methods are known to achieve complementary bias-variance trade-off properties, with TD tending to achieve lower variance but potentially higher bias. In this paper, we argue that the larger bias of TD can be a result of the amplification of local… CONTINUE READING
Figures, Tables, and Topics from this paper
Paper Mentions
Blog Post
3 Citations
META-Learning Eligibility Traces for More Sample Efficient Temporal Difference Learning
- Computer Science, Mathematics
- ArXiv
- 2020
- PDF
META-Learning State-based Eligibility Traces for More Sample-Efficient Policy Evaluation
- Computer Science, Mathematics
- AAMAS
- 2020
- PDF
References
SHOWING 1-10 OF 33 REFERENCES
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning
- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2016
- 124
- PDF
A new Q(lambda) with interim forward view and Monte Carlo equivalence
- Mathematics, Computer Science
- ICML
- 2014
- 28
Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda
- Mathematics, Computer Science
- ICML
- 2010
- 24
- PDF
A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning
- Computer Science, Mathematics
- AAMAS
- 2016
- 20
- PDF
Temporal Difference Learning with Neural Networks - Study of the Leakage Propagation Problem
- Computer Science, Mathematics
- ArXiv
- 2018
- 6
- PDF
Adaptive Lambda Least-Squares Temporal Difference Learning
- Mathematics, Computer Science
- ArXiv
- 2016
- 10
- PDF
Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view
- Mathematics, Computer Science
- ICML
- 2010
- 78
- PDF
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling
- Computer Science, Mathematics
- ICLR
- 2018
- 115
- PDF