• Corpus ID: 53115163

Exploration by Random Network Distillation

@article{Burda2019ExplorationBR,
  title={Exploration by Random Network Distillation},
  author={Yuri Burda and Harrison Edwards and Amos J. Storkey and Oleg Klimov},
  journal={ArXiv},
  year={2019},
  volume={abs/1810.12894}
}
We introduce an exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed. [] Key Method We also introduce a method to flexibly combine intrinsic and extrinsic rewards. We find that the random network distillation (RND) bonus combined with this increased flexibility enables significant progress on several hard exploration Atari games. In particular we establish state of the art performance on Montezuma's Revenge, a game famously…

Figures and Tables from this paper

Transfer Learning with Random Network Distillation Theory & Reinforcement Learning
TLDR
Evaluated the capability of policies learned by RND to transfer to other tasks, including the ability to transfer insights acquired while solving one problem to more easily solve another.
GAN-based Intrinsic Exploration for Sample Efficient Reinforcement Learning
TLDR
Generative Adversarial Network-based Intrinsic Reward Module is proposed that learns the distribution of the observed states and sends an intrinsic reward that is computed as high for states that are out of distribution, in order to lead agent to unexplored states.
Optimistic Exploration with Backward Bootstrapped Bonus for Deep Reinforcement Learning
TLDR
An Optimistic Exploration algorithm with Backward Bootstrapped Bonus (OEB3) for DRL is proposed and an UCBbonus indicating the uncertainty of Q-functions is constructed, which encourages the agent to explore the scarcely visited states and actions to reduce uncertainty.
Offline Reinforcement Learning as Anti-Exploration
TLDR
This paper designs a new offline RL agent that is competitive with the state of the art on a set of continuous control locomotion and manipulation tasks, instantiated with a bonus based on the prediction error of a variational autoencoder.
Generative Exploration and Exploitation
TLDR
Generative Exploration and Exploitation automatically generates start states to encourage the agent to explore the environment and to exploit received reward signals, and can adaptively tradeoff between exploration and exploitation according to the varying distributions of states experienced by the agent as the learning progresses.
Disentangling Exploitation from Exploration in Deep RL
TLDR
This work adopts a disruptive but simple and generic perspective, where it explicitly disentangle exploration and exploitation, and showcases its sample-efficiency and robustness, and discusses further implications.
Towards High-Level Intrinsic Exploration in Reinforcement Learning
TLDR
This work proposes a novel end-to-end intrinsic reward formulation that introduces high-level exploration in reinforcement learning and forms curiosity as the error in an agent’s ability to reconstruct the observations given their contexts.
Maximum Entropy Model-based Reinforcement Learning
TLDR
This work designs a novel exploration method that takes into account features of the model-based approach and demonstrates through experiments that the method significantly improves the performance of model- based algorithm Dreamer.
Self-Supervised Exploration via Latent Bayesian Surprise
TLDR
A curiosity-based bonus as intrinsic reward for Reinforcement Learning is proposed, computed as the Bayesian surprise with respect to a latent state variable, learnt by reconstructing fixed random features.
Bayesian Curiosity for Efficient Exploration in Reinforcement Learning
TLDR
A novel method based on Bayesian linear regression and latent space embedding to generate an intrinsic reward signal that encourages the learning agent to seek out unexplored parts of the state space is introduced.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 54 REFERENCES
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
TLDR
This paper considers the challenging Atari games domain, and proposes a new exploration method based on assigning exploration bonuses from a concurrently learned model of the system dynamics that provides the most consistent improvement across a range of games that pose a major challenge for prior methods.
Noisy Networks for Exploration
TLDR
It is found that replacing the conventional exploration heuristics for A3C, DQN and dueling agents with NoisyNet yields substantially higher scores for a wide range of Atari games, in some cases advancing the agent from sub to super-human performance.
Learning Montezuma's Revenge from a Single Demonstration
TLDR
A new method for learning from a single demonstration to solve hard exploration tasks like the Atari game Montezuma's Revenge, for which a trained agent achieving a high-score of 74,500 is presented, better than any previously published result.
Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning
TLDR
This work proposes to learn a model of the MDP transition probabilities concurrently with the policy, and to form intrinsic rewards that approximate the KL-divergence of the true transition probabilities from the learned model, which results in using surprisal as intrinsic motivation.
VIME: Variational Information Maximizing Exploration
TLDR
VIME is introduced, an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics which efficiently handles continuous state and action spaces and can be applied with several different underlying RL algorithms.
DORA The Explorer: Directed Outreaching Reinforcement Action-Selection
TLDR
This work proposes $E-values, a generalization of counters that can be used to evaluate the propagating exploratory value over state-action trajectories, and shows that using $E$-values improves learning and performance over traditional counters.
Observe and Look Further: Achieving Consistent Performance on Atari
TLDR
This paper proposes an algorithm that addresses three key challenges that any algorithm needs to master in order to perform well on all games: processing diverse reward distributions, reasoning over long time horizons, and exploring efficiently.
Parameter Space Noise for Exploration
TLDR
This work demonstrates that RL with parameter noise learns more efficiently than traditional RL with action space noise and evolutionary strategies individually through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks.
Randomized Prior Functions for Deep Reinforcement Learning
TLDR
It is shown that this approach is efficient with linear representations, provides simple illustrations of its efficacy with nonlinear representations and scales to large-scale problems far better than previous attempts.
EX2: Exploration with Exemplar Models for Deep Reinforcement Learning
TLDR
This work proposes a novelty detection algorithm for exploration that is based entirely on discriminatively trained exemplar models, where classifiers are trained to discriminate each visited state against all others, and shows that this kind of discriminative modeling corresponds to implicit density estimation.
...
1
2
3
4
5
...