• Corpus ID: 211532691

RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments

@article{Raileanu2020RIDERI,
  title={RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments},
  author={Roberta Raileanu and Tim Rockt{\"a}schel},
  journal={ArXiv},
  year={2020},
  volume={abs/2002.12292}
}
Exploration in sparse reward environments remains one of the key challenges of model-free reinforcement learning. Instead of solely relying on extrinsic rewards provided by the environment, many state-of-the-art methods use intrinsic rewards to encourage exploration. However, we show that existing methods fall short in procedurally-generated environments where an agent is unlikely to visit a state more than once. We propose a novel type of intrinsic reward which encourages the agent to take… 
PROCEDURALLY-GENERATED ENVIRONMENTS
TLDR
Motivated by how humans distinguish good exploration behaviors by looking into the entire episode, RAPID is introduced, a simple yet effective episode-level exploration method for procedurally-generated environments that significantly outperforms the state-of-the-art intrinsic reward strategies in terms of sample efficiency and final performance.
Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments
TLDR
Motivated by how humans distinguish good exploration behaviors by looking into the entire episode, RAPID is introduced, a simple yet effective episode-level exploration method for procedurally-generated environments that significantly outperforms the state-of-the-art intrinsic reward strategies in terms of sample efficiency and final performance.
Long-Term Visitation Value for Deep Exploration in Sparse Reward Reinforcement Learning
TLDR
A novel approach that plans exploration actions far into the future by using a long-term visitation count and decouples exploration and exploitation by learning a separate function assessing the exploration value of the actions, which outperforms existing methods in environments with sparse rewards.
Multimodal Reward Shaping for Efficient Exploration in Reinforcement Learning
TLDR
A novel metric entitled Jain’s fairness index (JFI) is introduced to replace the entropy regularizer, which requires no additional models or memory and overcomes the vanishing intrinsic rewards problem and can be generalized into arbitrary tasks.
BeBold: Exploration Beyond the Boundary of Explored Regions
TLDR
The regulated difference of inverse visitation counts is proposed as a simple but effective criterion for IR that helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment.
MADE: Exploration via Maximizing Deviation from Explored Regions
TLDR
This work proposes a new exploration approach via maximizing the deviation of the occupancy of the next policy from the explored regions, giving rise to a new intrinsic reward that adjusts existing bonuses.
RELEVANT ACTIONS MATTER: MOTIVATING AGENTS
TLDR
This work proposes a new exploration method, called Relevant Actions Matter (RAM), shifting the emphasis from state novelty to state with relevant actions, and evaluates RAM on the procedurallygenerated environment MiniGrid, against state-of-the-art methods and shows that RAM greatly reduces sample complexity.
Improving Intrinsic Exploration with Language Abstractions
TLDR
This work evaluates whether language can improve over existing exploration methods by directly extending (and comparing to) competitive intrinsic exploration baselines: AMIGo (Campero et al, 2021) and NovelD (Zhang et al., 2021).
Decoupled Reinforcement Learning to Stabilise Intrinsically-Motivated Exploration
TLDR
Decoupled RL is introduced as a general framework which trains separate policies for intrinsicallymotivated exploration and exploitation and decoupling allows DeRL to leverage the benefits of intrinsic rewards for exploration while demonstrating improved robustness and sample efficiency.
Explore and Control with Adversarial Surprise
TLDR
It is shown that Adversarial Surprise learns more complex behaviors, and explores more effectively than competitive baselines, outperforming intrinsic motivation methods based on active inference, novelty-seeking, and multi-agent unsupervised RL in MiniGrid, Atari and VizDoom environments.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 67 REFERENCES
Scheduled Intrinsic Drive: A Hierarchical Take on Intrinsically Motivated Exploration
TLDR
A new type of intrinsic reward denoted as successor feature control (SFC) is introduced, which takes into account statistics over complete trajectories and thus differs from previous methods that only use local information to evaluate intrinsic motivation.
Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning
TLDR
This work proposes to learn a model of the MDP transition probabilities concurrently with the policy, and to form intrinsic rewards that approximate the KL-divergence of the true transition probabilities from the learned model, which results in using surprisal as intrinsic motivation.
Deep Curiosity Search: Intra-Life Exploration Improves Performance on Challenging Deep Reinforcement Learning Problems
TLDR
Deep Curiosity Search is introduced, which encourages intra-life exploration by rewarding agents for visiting as many different states as possible within each episode, and it is shown that DeepCS matches the performance of current state-of-the-art methods on Montezuma's Revenge.
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
TLDR
This paper considers the challenging Atari games domain, and proposes a new exploration method based on assigning exploration bonuses from a concurrently learned model of the system dynamics that provides the most consistent improvement across a range of games that pose a major challenge for prior methods.
Go-Explore: a New Approach for Hard-Exploration Problems
TLDR
A new algorithm called Go-Explore, which exploits the following principles to remember previously visited states, solve simulated environments through any available means, and robustify via imitation learning, which results in a dramatic performance improvement on hard-exploration problems.
Large-Scale Study of Curiosity-Driven Learning
TLDR
This paper performs the first large-scale study of purely curiosity-driven learning, i.e. without any extrinsic rewards, across 54 standard benchmark environments, including the Atari game suite, and shows surprisingly good performance.
Curiosity-Driven Exploration by Self-Supervised Prediction
TLDR
This work forms curiosity as the error in an agent's ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model, which scales to high-dimensional continuous state spaces like images, bypasses the difficulties of directly predicting pixels, and ignores the aspects of the environment that cannot affect the agent.
InfoBot: Transfer and Exploration via the Information Bottleneck
TLDR
This work proposes to learn about decision states from prior experience by training a goal-conditioned policy with an information bottleneck, and finds that this simple mechanism effectively identifies decision states, even in partially observed settings.
Reinforcement Learning with Unsupervised Auxiliary Tasks
TLDR
This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.
Contingency-Aware Exploration in Reinforcement Learning
TLDR
This study develops an attentive dynamics model (ADM) that discovers controllable elements of the observations, which are often associated with the location of the character in Atari games, which confirms that contingency-awareness is indeed an extremely powerful concept for tackling exploration problems in reinforcement learning.
...
1
2
3
4
5
...