First return then explore
@article{Ecoffet2021FirstRT, title={First return then explore}, author={Adrien Ecoffet and Joost Huizinga and Joel Lehman and Kenneth O. Stanley and Jeff Clune}, journal={Nature}, year={2021}, volume={590 7847}, pages={ 580-586 } }
Reinforcement learning promises to solve complex sequential-decision problems autonomously by specifying a high-level reward function only. However, reinforcement learning algorithms struggle when, as is often the case, simple and intuitive rewards provide sparse1 and deceptive2 feedback. Avoiding these pitfalls requires a thorough exploration of the environment, but creating algorithms that can do so remains one of the central challenges of the field. Here we hypothesize that the main…
87 Citations
Divide & Conquer Imitation Learning
- Computer ScienceArXiv
- 2022
This paper presents a novel algorithm designed to imitate complex robotic tasks from the states of an expert trajectory based on a sequential inductive bias and shows that it imitates a non-holonomic navigation task and scales to a complex simulated robotic manipulation task with very high sample efficiency.
GAN-based Intrinsic Exploration for Sample Efficient Reinforcement Learning
- Computer ScienceICAART
- 2022
Generative Adversarial Network-based Intrinsic Reward Module is proposed that learns the distribution of the observed states and sends an intrinsic reward that is computed as high for states that are out of distribution, in order to lead agent to unexplored states.
Go-Blend Behavior and Affect
- Computer Science, Psychology2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)
- 2021
The proposed framework introduces a new paradigm shift for affect modeling by viewing the affect modeling task as a reinforcement learning process and empowers believable AI-based game testing by providing agents that can blend and express a multitude of behavioral and affective patterns.
Procedural Content Generation: Better Benchmarks for Transfer Reinforcement Learning
- Computer Science2021 IEEE Conference on Games (CoG)
- 2021
It is noted that another development, the increase in procedural content generation (PCG), can improve both benchmarking and generalization in TRL, and that Alchemy and Meta-World are emerging as interesting benchmark suites.
BeBold: Exploration Beyond the Boundary of Explored Regions
- Computer ScienceArXiv
- 2020
The regulated difference of inverse visitation counts is proposed as a simple but effective criterion for IR that helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment.
Improved Sample Complexity for Incremental Autonomous Exploration in MDPs
- Computer ScienceNeurIPS
- 2020
A novel model-based approach that interleaves discovering new states from s0 and improving the accuracy of a model estimate that is used to compute goal-conditioned policies is introduced and is the first algorithm that can return an "/cmin-optimal policy for any cost-sensitive shortest-path problem defined on the L-reachable states with minimum cost cmin.
A Unifying Framework for Reinforcement Learning and Planning
- Computer Science
- 2020
A unifying algorithmic framework for reinforcement learning and planning (FRAP), which identifies underlying dimensions on which MDP planning and learning algorithms have to decide and compares a variety of well-known planning, model-free and model-based RL algorithms along these dimensions.
Learning Design and Construction with Varying-Sized Materials via Prioritized Memory Resets
- Computer ScienceArXiv
- 2022
This paper develops a novel technique, prioritized memory resetting (PMR), which adaptively resets the state to those most critical configurations from a replay buffer so that the robot can resume training on partial architectures instead of from scratch.
Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey
- Computer ScienceArXiv
- 2020
A typology of methods where deep RL algorithms are trained to tackle the developmental robotics problem of the autonomous acquisition of open-ended repertoires of skills is proposed at the intersection of deep RL and developmental approaches.
BYOL-Explore: Exploration by Bootstrapped Prediction
- Computer ScienceArXiv
- 2022
It is shown that BYOL-Explore is effective in DM-HARD-8, a challenging partially-observable continuous-action hard-exploration benchmark with visually-rich 3-D environments and achieves superhuman performance on the ten hardest exploration games in Atari while having a much simpler design than other competitive agents.
References
SHOWING 1-10 OF 64 REFERENCES
ON BONUS-BASED EXPLORATION METHODS
- Computer Science
- 2020
The results suggest that recent gains in MONTEZUMA’S REVENGE may be better attributed to architecture change, rather than better exploration schemes; and that the real pace of progress in exploration research for Atari 2600 games may have been obfuscated by good results on a single domain.
MIME: Mutual Information Minimisation Exploration
- Computer ScienceArXiv
- 2020
This work proposes a counter-intuitive solution to reinforcement learning agents that get stuck at abrupt environmental transition boundaries where an agent learns a latent representation of the environment without trying to predict the future states called Mutual Information Minimising Exploration (MIME).
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
- Computer ScienceNature
- 2020
The MuZero algorithm is presented, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics.
Grandmaster level in StarCraft II using multi-agent reinforcement learning
- Computer ScienceNature
- 2019
The agent, AlphaStar, is evaluated, which uses a multi-agent reinforcement learning algorithm and has reached Grandmaster level, ranking among the top 0.2% of human players for the real-time strategy game StarCraft II.
Combining Experience Replay with Exploration by Random Network Distillation
- Computer Science2019 IEEE Conference on Games (CoG)
- 2019
This work shows how to efficiently combine Intrinsic Rewards with Experience Replay in order to achieve more efficient and robust exploration (with respect to PPO/RND) and consequently better results in terms of agent performances and sample efficiency.
Deriving Subgoals Autonomously to Accelerate Learning in Sparse Reward Domains
- Computer ScienceAAAI
- 2019
This work describes a new, autonomous approach for deriving subgoals from raw pixels that is more efficient than competing methods, and proposes a novel intrinsic reward scheme for exploiting the derivedSubgoals, applying it to three Atari games with sparse rewards.
Go-Explore: a New Approach for Hard-Exploration Problems
- Computer ScienceArXiv
- 2019
A new algorithm called Go-Explore, which exploits the following principles to remember previously visited states, solve simulated environments through any available means, and robustify via imitation learning, which results in a dramatic performance improvement on hard-exploration problems.
Novelty Search and the Problem with Objectives
- Computer Science
- 2011
By synthesizing a growing body ofwork in search processes that are not driven by explicit objectives, this paper advances the hypothesis that there is a fundamental problem with the dominant paradigm…
Solving Montezuma's Revenge with Planning and Reinforcement Learning
- Computer Science
- 2017
This work applies planning and reinforcement learning approaches, combined with domain knowledge, to enable an agent to obtain better scores in Montezuma's Revenge, and hopes that these domain-specific algorithms can inspire better approaches to solve SDPs with sparse feedback in general.
Contingency-Aware Exploration in Reinforcement Learning
- Computer ScienceICLR
- 2019
This study develops an attentive dynamics model (ADM) that discovers controllable elements of the observations, which are often associated with the location of the character in Atari games, which confirms that contingency-awareness is indeed an extremely powerful concept for tackling exploration problems in reinforcement learning.