Corpus ID: 195069379

Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

@article{Penedones2019AdaptiveTL,
  title={Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates},
  author={Hugo Penedones and C. Riquelme and D. Vincent and Hartmut Maennel and T. Mann and Andr{\'e} Barreto and S. Gelly and Gergely Neu},
  journal={ArXiv},
  year={2019},
  volume={abs/1906.07987}
}
  • Hugo Penedones, C. Riquelme, +5 authors Gergely Neu
  • Published 2019
  • Computer Science, Mathematics
  • ArXiv
  • We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation. The two methods are known to achieve complementary bias-variance trade-off properties, with TD tending to achieve lower variance but potentially higher bias. In this paper, we argue that the larger bias of TD can be a result of the amplification of local… CONTINUE READING
    2 Citations

    Paper Mentions

    META-Learning State-based Eligibility Traces for More Sample-Efficient Policy Evaluation
    META-Learning Eligibility Traces for More Sample Efficient Temporal Difference Learning

    References

    SHOWING 1-10 OF 33 REFERENCES
    An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning
    • 119
    • PDF
    A new Q ( � ) with interim forward view and Monte Carlo equivalence
    • 23
    • PDF
    Off-policy learning with eligibility traces: a survey
    • 60
    • PDF
    A new Q(lambda) with interim forward view and Monte Carlo equivalence
    • 28
    Analyzing the Role of Temporal Differencing in Deep Reinforcement Learning
    • 3
    Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda
    • 24
    • PDF
    A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning
    • 20
    • PDF
    Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view
    • 76
    • PDF