Corpus ID: 211126477

Never Give Up: Learning Directed Exploration Strategies

@article{Badia2020NeverGU,
  title={Never Give Up: Learning Directed Exploration Strategies},
  author={Adri{\`a} Puigdom{\`e}nech Badia and P. Sprechmann and Alex Vitvitskyi and Daniel Guo and B. Piot and Steven Kapturowski and O. Tieleman and Mart{\'i}n Arjovsky and A. Pritzel and Andew Bolt and Charles Blundell},
  journal={ArXiv},
  year={2020},
  volume={abs/2002.06038}
}
  • Adrià Puigdomènech Badia, P. Sprechmann, +8 authors Charles Blundell
  • Published 2020
  • Mathematics, Computer Science
  • ArXiv
  • We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies. We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies, thereby encouraging the agent to repeatedly revisit all states in its environment. A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty… CONTINUE READING

    Citations

    Publications citing this paper.
    SHOWING 1-8 OF 8 CITATIONS

    Agent57: Outperforming the Atari Human Benchmark

    VIEW 6 EXCERPTS
    CITES BACKGROUND & METHODS

    Group Equivariant Deep Reinforcement Learning

    VIEW 1 EXCERPT
    CITES BACKGROUND
    HIGHLY INFLUENCED

    Rapid Task-Solving in Novel Environments

    VIEW 1 EXCERPT
    CITES METHODS

    Temporally-Extended {\epsilon}-Greedy Exploration

    VIEW 2 EXCERPTS
    CITES BACKGROUND

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 44 REFERENCES

    Contingency-Aware Exploration in Reinforcement Learning

    VIEW 2 EXCERPTS

    Episodic Curiosity through Reachability

    VIEW 1 EXCERPT

    Observe and Look Further: Achieving Consistent Performance on Atari

    VIEW 2 EXCERPTS

    Curiosity-Driven Exploration by Self-Supervised Prediction

    VIEW 5 EXCERPTS
    HIGHLY INFLUENTIAL

    Exploration by Random Network Distillation

    VIEW 6 EXCERPTS
    HIGHLY INFLUENTIAL