• Corpus ID: 211532691

RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments

@article{Raileanu2020RIDERI,
title={RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments},
author={Roberta Raileanu and Tim Rockt{\"a}schel},
journal={ArXiv},
year={2020},
volume={abs/2002.12292}
}
• Published 27 February 2020
• Computer Science
• ArXiv
Exploration in sparse reward environments remains one of the key challenges of model-free reinforcement learning. Instead of solely relying on extrinsic rewards provided by the environment, many state-of-the-art methods use intrinsic rewards to encourage exploration. However, we show that existing methods fall short in procedurally-generated environments where an agent is unlikely to visit a state more than once. We propose a novel type of intrinsic reward which encourages the agent to take…
56 Citations

Figures and Tables from this paper

PROCEDURALLY-GENERATED ENVIRONMENTS
• Computer Science
• 2021
Motivated by how humans distinguish good exploration behaviors by looking into the entire episode, RAPID is introduced, a simple yet effective episode-level exploration method for procedurally-generated environments that significantly outperforms the state-of-the-art intrinsic reward strategies in terms of sample efficiency and final performance.
Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments
• Computer Science
ICLR
• 2021
Motivated by how humans distinguish good exploration behaviors by looking into the entire episode, RAPID is introduced, a simple yet effective episode-level exploration method for procedurally-generated environments that significantly outperforms the state-of-the-art intrinsic reward strategies in terms of sample efficiency and final performance.
Long-Term Visitation Value for Deep Exploration in Sparse Reward Reinforcement Learning
• Computer Science
Algorithms
• 2022
A novel approach that plans exploration actions far into the future by using a long-term visitation count and decouples exploration and exploitation by learning a separate function assessing the exploration value of the actions, which outperforms existing methods in environments with sparse rewards.
Multimodal Reward Shaping for Efficient Exploration in Reinforcement Learning
• Computer Science
ArXiv
• 2021
A novel metric entitled Jain’s fairness index (JFI) is introduced to replace the entropy regularizer, which requires no additional models or memory and overcomes the vanishing intrinsic rewards problem and can be generalized into arbitrary tasks.
BeBold: Exploration Beyond the Boundary of Explored Regions
• Computer Science
ArXiv
• 2020
The regulated difference of inverse visitation counts is proposed as a simple but effective criterion for IR that helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment.
MADE: Exploration via Maximizing Deviation from Explored Regions
• Computer Science
NeurIPS
• 2021
This work proposes a new exploration approach via maximizing the deviation of the occupancy of the next policy from the explored regions, giving rise to a new intrinsic reward that adjusts existing bonuses.
RELEVANT ACTIONS MATTER: MOTIVATING AGENTS
• Computer Science
• 2021
This work proposes a new exploration method, called Relevant Actions Matter (RAM), shifting the emphasis from state novelty to state with relevant actions, and evaluates RAM on the procedurallygenerated environment MiniGrid, against state-of-the-art methods and shows that RAM greatly reduces sample complexity.
Improving Intrinsic Exploration with Language Abstractions
• Computer Science
ArXiv
• 2022
This work evaluates whether language can improve over existing exploration methods by directly extending (and comparing to) competitive intrinsic exploration baselines: AMIGo (Campero et al, 2021) and NovelD (Zhang et al., 2021).
Decoupled Reinforcement Learning to Stabilise Intrinsically-Motivated Exploration
• Computer Science
AAMAS
• 2022
Decoupled RL is introduced as a general framework which trains separate policies for intrinsicallymotivated exploration and exploitation and decoupling allows DeRL to leverage the benefits of intrinsic rewards for exploration while demonstrating improved robustness and sample efficiency.
Explore and Control with Adversarial Surprise
• Computer Science
ArXiv
• 2021
It is shown that Adversarial Surprise learns more complex behaviors, and explores more effectively than competitive baselines, outperforming intrinsic motivation methods based on active inference, novelty-seeking, and multi-agent unsupervised RL in MiniGrid, Atari and VizDoom environments.

References

SHOWING 1-10 OF 67 REFERENCES
Scheduled Intrinsic Drive: A Hierarchical Take on Intrinsically Motivated Exploration
• Computer Science
ArXiv
• 2019
A new type of intrinsic reward denoted as successor feature control (SFC) is introduced, which takes into account statistics over complete trajectories and thus differs from previous methods that only use local information to evaluate intrinsic motivation.
Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning
• Computer Science
ArXiv
• 2017
This work proposes to learn a model of the MDP transition probabilities concurrently with the policy, and to form intrinsic rewards that approximate the KL-divergence of the true transition probabilities from the learned model, which results in using surprisal as intrinsic motivation.
Deep Curiosity Search: Intra-Life Exploration Improves Performance on Challenging Deep Reinforcement Learning Problems
• Computer Science
ArXiv
• 2018
Deep Curiosity Search is introduced, which encourages intra-life exploration by rewarding agents for visiting as many different states as possible within each episode, and it is shown that DeepCS matches the performance of current state-of-the-art methods on Montezuma's Revenge.
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
• Computer Science
ArXiv
• 2015
This paper considers the challenging Atari games domain, and proposes a new exploration method based on assigning exploration bonuses from a concurrently learned model of the system dynamics that provides the most consistent improvement across a range of games that pose a major challenge for prior methods.
Go-Explore: a New Approach for Hard-Exploration Problems
• Computer Science
ArXiv
• 2019
A new algorithm called Go-Explore, which exploits the following principles to remember previously visited states, solve simulated environments through any available means, and robustify via imitation learning, which results in a dramatic performance improvement on hard-exploration problems.
Large-Scale Study of Curiosity-Driven Learning
• Computer Science
ICLR
• 2019
This paper performs the first large-scale study of purely curiosity-driven learning, i.e. without any extrinsic rewards, across 54 standard benchmark environments, including the Atari game suite, and shows surprisingly good performance.
Curiosity-Driven Exploration by Self-Supervised Prediction
• Computer Science
2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
• 2017
This work forms curiosity as the error in an agent's ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model, which scales to high-dimensional continuous state spaces like images, bypasses the difficulties of directly predicting pixels, and ignores the aspects of the environment that cannot affect the agent.
InfoBot: Transfer and Exploration via the Information Bottleneck
• Computer Science
ICLR
• 2019
This work proposes to learn about decision states from prior experience by training a goal-conditioned policy with an information bottleneck, and finds that this simple mechanism effectively identifies decision states, even in partially observed settings.
Reinforcement Learning with Unsupervised Auxiliary Tasks
• Computer Science
ICLR
• 2017
This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.
Contingency-Aware Exploration in Reinforcement Learning
• Computer Science
ICLR
• 2019
This study develops an attentive dynamics model (ADM) that discovers controllable elements of the observations, which are often associated with the location of the character in Atari games, which confirms that contingency-awareness is indeed an extremely powerful concept for tackling exploration problems in reinforcement learning.