Corpus ID: 211532818

META-Learning State-based Eligibility Traces for More Sample-Efficient Policy Evaluation.

@article{Zhao2020METALearningSE,
  title={META-Learning State-based Eligibility Traces for More Sample-Efficient Policy Evaluation.},
  author={Mingde Zhao and Sitao Luan and Ian Porada and Xiaowen Chang and Doina Precup},
  journal={arXiv: Learning},
  year={2020}
}
  • Mingde Zhao, Sitao Luan, +2 authors Doina Precup
  • Published 2020
  • Computer Science, Mathematics
  • arXiv: Learning
  • Temporal-Difference (TD) learning is a standard and very successful reinforcement learning approach, at the core of both algorithms that learn the value of a given policy, as well as algorithms which learn how to improve policies. TD-learning with eligibility traces provides a way to boost sample efficiency by temporal credit assignment, i.e. deciding which portion of a reward should be assigned to predecessor states that occurred at different previous times, controlled by a parameter $\lambda… CONTINUE READING
    4
    Twitter Mentions

    Figures, Tables, and Topics from this paper.

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 27 REFERENCES

    A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning

    VIEW 13 EXCERPTS
    HIGHLY INFLUENTIAL