Corpus ID: 196470924

Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling

@article{Luo2020LearningSP,
  title={Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling},
  author={Yuping Luo and Huazhe Xu and Tengyu Ma},
  journal={ArXiv},
  year={2020},
  volume={abs/1907.05634}
}
  • Yuping Luo, Huazhe Xu, Tengyu Ma
  • Published 2020
  • Computer Science, Mathematics
  • ArXiv
  • Imitation learning, followed by reinforcement learning algorithms, is a promising paradigm to solve complex control tasks sample-efficiently. However, learning from demonstrations often suffers from the covariate shift problem, which results in cascading errors of the learned policy. We introduce a notion of conservatively-extrapolated value functions, which provably lead to policies with self-correction. We design an algorithm Value Iteration with Negative Sampling (VINS) that practically… CONTINUE READING

    Figures, Tables, and Topics from this paper.

    Explore key concepts

    Links to highly relevant papers for key concepts in this paper:

    Citations

    Publications citing this paper.
    SHOWING 1-2 OF 2 CITATIONS

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 66 REFERENCES

    Overcoming Exploration in Reinforcement Learning with Demonstrations

    VIEW 6 EXCERPTS
    HIGHLY INFLUENTIAL