Corpus ID: 208175505

Self-Imitation Learning via Trajectory-Conditioned Policy for Hard-Exploration Tasks

  title={Self-Imitation Learning via Trajectory-Conditioned Policy for Hard-Exploration Tasks},
  author={Yijie Guo and Jong-wook Choi and Marcin Moczulski and Samy Bengio and Mohammad Norouzi and Honglak Lee},
  journal={arXiv: Learning},
  • Yijie Guo, Jong-wook Choi, +3 authors Honglak Lee
  • Published 2019
  • Mathematics, Computer Science
  • arXiv: Learning
  • Imitation learning from human-expert demonstrations has been shown to be greatly helpful for challenging reinforcement learning problems with sparse environment rewards. However, it is very difficult to achieve similar success without relying on expert demonstrations. Recent works on self-imitation learning showed that imitating the agent's own past good experience could indirectly drive exploration in some environments, but these methods often lead to sub-optimal and myopic behavior. To… CONTINUE READING

    Figures, Tables, and Topics from this paper.