• Corpus ID: 59413825

# Go-Explore: a New Approach for Hard-Exploration Problems

@article{Ecoffet2019GoExploreAN,
title={Go-Explore: a New Approach for Hard-Exploration Problems},
author={Adrien Ecoffet and Joost Huizinga and Joel Lehman and Kenneth O. Stanley and Jeff Clune},
journal={ArXiv},
year={2019},
volume={abs/1901.10995}
}
• Published 30 January 2019
• Computer Science
• ArXiv
A grand challenge in reinforcement learning is intelligent exploration, especially when rewards are sparse or deceptive. Two Atari games serve as benchmarks for such hard-exploration domains: Montezuma's Revenge and Pitfall. On both games, current RL algorithms perform poorly, even those with intrinsic motivation, which is the dominant method to improve performance on hard-exploration domains. To address this shortfall, we introduce a new algorithm called Go-Explore. It exploits the following…
228 Citations

## Figures and Tables from this paper

On Bonus Based Exploration Methods In The Arcade Learning Environment
• Computer Science
ICLR
• 2020
The results suggest that recent gains in Montezuma's Revenge may be better attributed to architecture change, rather than better exploration schemes; and that the real pace of progress in exploration research for Atari 2600 games may have been obfuscated by good results on a single domain.
BeBold: Exploration Beyond the Boundary of Explored Regions
• Computer Science
ArXiv
• 2020
The regulated difference of inverse visitation counts is proposed as a simple but effective criterion for IR that helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment.
ON BONUS-BASED EXPLORATION METHODS
• Computer Science
• 2020
The results suggest that recent gains in MONTEZUMA’S REVENGE may be better attributed to architecture change, rather than better exploration schemes; and that the real pace of progress in exploration research for Atari 2600 games may have been obfuscated by good results on a single domain.
MADE: Exploration via Maximizing Deviation from Explored Regions
• Computer Science
NeurIPS
• 2021
This work proposes a new exploration approach via maximizing the deviation of the occupancy of the next policy from the explored regions, giving rise to a new intrinsic reward that adjusts existing bonuses.
Efficient Exploration by Novelty-Pursuit
• Computer Science
DAI
• 2020
A goal-selection criterion in IMGEP based on the principle of MSEE is proposed, which results in the new exploration method novelty-pursuit, which outperforms the state-of-the-art approaches that use curiosity-driven exploration.
On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman
• Computer Science
AIIDE
• 2019
While model-free random exploration is typically futile, a model-based automatic reasoning module is developed that can be used for safer exploration by pruning actions that will surely lead the agent to death and can significantly improve learning.
NovelD: A Simple yet Effective Exploration Criterion
• Computer Science
NeurIPS
• 2021
This paper proposes a simple but effective criterion called NovelD, which solves all the static procedurally-generated tasks in Mini-Grid with just 120M environment steps, without any curriculum learning and shows comparable performance or even outperforms multiple SOTA exploration methods in many hard exploration tasks.
RIDE: REWARDING IMPACT-DRIVEN EXPLORATION
• Computer Science
• 2019
This work proposes a novel type of intrinsic exploration bonus which rewards the agent for actions that change the agent's learned state representation and is more sample efficient than existing exploration methods, particularly for procedurally-generated MiniGrid environments.
Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning
• Computer Science
ArXiv
• 2022
This paper provides a simple way to learn options (sequences of actions) instead of having to handcraft them, and demonstrates the performance advantage in three navigation tasks.
Task-agnostic Exploration in Reinforcement Learning
• Computer Science
NeurIPS
• 2020
An efficient task-agnostic RL algorithm that finds near-optimal policies for N arbitrary tasks after at most $\tilde O(\log(N)H^5SA/\epsilon^2)$ exploration episodes and provides an $N$-independent sample complexity bound of \textsc{UCBZero} in the statistically easier setting when the ground truth reward functions are known.

## References

SHOWING 1-10 OF 113 REFERENCES
Deep Curiosity Search: Intra-Life Exploration Improves Performance on Challenging Deep Reinforcement Learning Problems
• Computer Science
ArXiv
• 2018
Deep Curiosity Search is introduced, which encourages intra-life exploration by rewarding agents for visiting as many different states as possible within each episode, and it is shown that DeepCS matches the performance of current state-of-the-art methods on Montezuma's Revenge.
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
• Computer Science
ArXiv
• 2015
This paper considers the challenging Atari games domain, and proposes a new exploration method based on assigning exploration bonuses from a concurrently learned model of the system dynamics that provides the most consistent improvement across a range of games that pose a major challenge for prior methods.
#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
• Computer Science
NIPS
• 2017
A simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks, and is found that simple hash functions can achieve surprisingly good results on many challenging tasks.
Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents
• Computer Science
NeurIPS
• 2018
This paper shows that algorithms that have been invented to promote directed exploration in small-scale evolved neural networks via populations of exploring agents, specifically novelty search and quality diversity algorithms, can be hybridized with ES to improve its performance on sparse or deceptive deep RL tasks, while retaining scalability.
Learning Montezuma's Revenge from a Single Demonstration
• Computer Science
ArXiv
• 2018
A new method for learning from a single demonstration to solve hard exploration tasks like the Atari game Montezuma's Revenge, for which a trained agent achieving a high-score of 74,500 is presented, better than any previously published result.
Observe and Look Further: Achieving Consistent Performance on Atari
• Computer Science
ArXiv
• 2018
This paper proposes an algorithm that addresses three key challenges that any algorithm needs to master in order to perform well on all games: processing diverse reward distributions, reasoning over long time horizons, and exploring efficiently.
GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms
• Computer Science
ICML
• 2018
This paper presents the GEP-PG approach, taking the best of both worlds by sequentially combining a Goal Exploration Process and two variants of DDPG on a low dimensional deceptive reward problem and on the larger Half-Cheetah benchmark.
Contingency-Aware Exploration in Reinforcement Learning
• Computer Science
ICLR
• 2019
This study develops an attentive dynamics model (ADM) that discovers controllable elements of the observations, which are often associated with the location of the character in Atari games, which confirms that contingency-awareness is indeed an extremely powerful concept for tackling exploration problems in reinforcement learning.
Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning
• Computer Science
ArXiv
• 2017
This work proposes to learn a model of the MDP transition probabilities concurrently with the policy, and to form intrinsic rewards that approximate the KL-divergence of the true transition probabilities from the learned model, which results in using surprisal as intrinsic motivation.
Exploration by Random Network Distillation
• Computer Science
ICLR
• 2019
An exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed and a method to flexibly combine intrinsic and extrinsic rewards that enables significant progress on several hard exploration Atari games is introduced.