Corpus ID: 219708607

META-Learning Eligibility Traces for More Sample Efficient Temporal Difference Learning

@article{Zhao2020METALearningET,
  title={META-Learning Eligibility Traces for More Sample Efficient Temporal Difference Learning},
  author={Mingde Zhao},
  journal={ArXiv},
  year={2020},
  volume={abs/2006.08906}
}
  • Mingde Zhao
  • Published 2020
  • Computer Science, Mathematics
  • ArXiv
  • Temporal-Difference (TD) learning is a standard and very successful reinforcement learning approach, at the core of both algorithms that learn the value of a given policy, as well as algorithms which learn how to improve policies. TD-learning with eligibility traces provides a way to do temporal credit assignment, i.e. decide which portion of a reward should be assigned to predecessor states that occurred at different previous times, controlled by a parameter λ. However, tuning this parameter… CONTINUE READING

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 34 REFERENCES
    A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning
    18
    Directly Estimating the Variance of the λ-Return Using Temporal-Difference Methods
    10
    Off-policy TD( l) with a true online equivalence
    31
    Reinforcement Learning: An Introduction
    24948
    Analytical Mean Squared Error Curves for Temporal Difference Learning
    52
    A new Q ( � ) with interim forward view and Monte Carlo equivalence
    23
    AND SILVER, D. Meta-gradient reinforcement learning
    • 2018
    Adaptive Lambda Least-Squares Temporal Difference Learning
    8
    Adaptive Step-Size for Online Temporal Difference Learning
    44
    Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates
    2