• Corpus ID: 59413825

Go-Explore: a New Approach for Hard-Exploration Problems

@article{Ecoffet2019GoExploreAN,
  title={Go-Explore: a New Approach for Hard-Exploration Problems},
  author={Adrien Ecoffet and Joost Huizinga and Joel Lehman and Kenneth O. Stanley and Jeff Clune},
  journal={ArXiv},
  year={2019},
  volume={abs/1901.10995}
}
A grand challenge in reinforcement learning is intelligent exploration, especially when rewards are sparse or deceptive. Two Atari games serve as benchmarks for such hard-exploration domains: Montezuma's Revenge and Pitfall. On both games, current RL algorithms perform poorly, even those with intrinsic motivation, which is the dominant method to improve performance on hard-exploration domains. To address this shortfall, we introduce a new algorithm called Go-Explore. It exploits the following… 
On Bonus Based Exploration Methods In The Arcade Learning Environment
TLDR
The results suggest that recent gains in Montezuma's Revenge may be better attributed to architecture change, rather than better exploration schemes; and that the real pace of progress in exploration research for Atari 2600 games may have been obfuscated by good results on a single domain.
BeBold: Exploration Beyond the Boundary of Explored Regions
TLDR
The regulated difference of inverse visitation counts is proposed as a simple but effective criterion for IR that helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment.
ON BONUS-BASED EXPLORATION METHODS
TLDR
The results suggest that recent gains in MONTEZUMA’S REVENGE may be better attributed to architecture change, rather than better exploration schemes; and that the real pace of progress in exploration research for Atari 2600 games may have been obfuscated by good results on a single domain.
MADE: Exploration via Maximizing Deviation from Explored Regions
TLDR
This work proposes a new exploration approach via maximizing the deviation of the occupancy of the next policy from the explored regions, giving rise to a new intrinsic reward that adjusts existing bonuses.
Efficient Exploration by Novelty-Pursuit
TLDR
A goal-selection criterion in IMGEP based on the principle of MSEE is proposed, which results in the new exploration method novelty-pursuit, which outperforms the state-of-the-art approaches that use curiosity-driven exploration.
On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman
TLDR
While model-free random exploration is typically futile, a model-based automatic reasoning module is developed that can be used for safer exploration by pruning actions that will surely lead the agent to death and can significantly improve learning.
NovelD: A Simple yet Effective Exploration Criterion
TLDR
This paper proposes a simple but effective criterion called NovelD, which solves all the static procedurally-generated tasks in Mini-Grid with just 120M environment steps, without any curriculum learning and shows comparable performance or even outperforms multiple SOTA exploration methods in many hard exploration tasks.
RIDE: REWARDING IMPACT-DRIVEN EXPLORATION
  • Computer Science
  • 2019
TLDR
This work proposes a novel type of intrinsic exploration bonus which rewards the agent for actions that change the agent's learned state representation and is more sample efficient than existing exploration methods, particularly for procedurally-generated MiniGrid environments.
Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning
TLDR
This paper provides a simple way to learn options (sequences of actions) instead of having to handcraft them, and demonstrates the performance advantage in three navigation tasks.
Task-agnostic Exploration in Reinforcement Learning
TLDR
An efficient task-agnostic RL algorithm that finds near-optimal policies for N arbitrary tasks after at most $\tilde O(\log(N)H^5SA/\epsilon^2)$ exploration episodes and provides an $N$-independent sample complexity bound of \textsc{UCBZero} in the statistically easier setting when the ground truth reward functions are known.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 113 REFERENCES
Deep Curiosity Search: Intra-Life Exploration Improves Performance on Challenging Deep Reinforcement Learning Problems
TLDR
Deep Curiosity Search is introduced, which encourages intra-life exploration by rewarding agents for visiting as many different states as possible within each episode, and it is shown that DeepCS matches the performance of current state-of-the-art methods on Montezuma's Revenge.
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
TLDR
This paper considers the challenging Atari games domain, and proposes a new exploration method based on assigning exploration bonuses from a concurrently learned model of the system dynamics that provides the most consistent improvement across a range of games that pose a major challenge for prior methods.
#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
TLDR
A simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks, and is found that simple hash functions can achieve surprisingly good results on many challenging tasks.
Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents
TLDR
This paper shows that algorithms that have been invented to promote directed exploration in small-scale evolved neural networks via populations of exploring agents, specifically novelty search and quality diversity algorithms, can be hybridized with ES to improve its performance on sparse or deceptive deep RL tasks, while retaining scalability.
Learning Montezuma's Revenge from a Single Demonstration
TLDR
A new method for learning from a single demonstration to solve hard exploration tasks like the Atari game Montezuma's Revenge, for which a trained agent achieving a high-score of 74,500 is presented, better than any previously published result.
Observe and Look Further: Achieving Consistent Performance on Atari
TLDR
This paper proposes an algorithm that addresses three key challenges that any algorithm needs to master in order to perform well on all games: processing diverse reward distributions, reasoning over long time horizons, and exploring efficiently.
GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms
TLDR
This paper presents the GEP-PG approach, taking the best of both worlds by sequentially combining a Goal Exploration Process and two variants of DDPG on a low dimensional deceptive reward problem and on the larger Half-Cheetah benchmark.
Contingency-Aware Exploration in Reinforcement Learning
TLDR
This study develops an attentive dynamics model (ADM) that discovers controllable elements of the observations, which are often associated with the location of the character in Atari games, which confirms that contingency-awareness is indeed an extremely powerful concept for tackling exploration problems in reinforcement learning.
Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning
TLDR
This work proposes to learn a model of the MDP transition probabilities concurrently with the policy, and to form intrinsic rewards that approximate the KL-divergence of the true transition probabilities from the learned model, which results in using surprisal as intrinsic motivation.
Exploration by Random Network Distillation
TLDR
An exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed and a method to flexibly combine intrinsic and extrinsic rewards that enables significant progress on several hard exploration Atari games is introduced.
...
1
2
3
4
5
...