• Corpus ID: 213793858

Novelty Search in representational space for sample efficient exploration

@article{Tao2020NoveltySI,
  title={Novelty Search in representational space for sample efficient exploration},
  author={Ruo Yu Tao and Vincent François-Lavet and Joelle Pineau},
  journal={ArXiv},
  year={2020},
  volume={abs/2009.13579}
}
We present a new approach for efficient exploration which leverages a low-dimensional encoding of the environment learned with a combination of model-based and model-free objectives. Our approach uses intrinsic rewards that are based on the distance of nearest neighbors in the low dimensional representational space to gauge novelty. We then leverage these intrinsic rewards for sample-efficient exploration with planning routines in representational space for hard exploration tasks with sparse… 

Figures and Tables from this paper

Random Encoder RL Encoder Representation Space Intrinsic Reward Policy Expected Reward Extrinsic Reward

The experiments show that RE3 significantly improves the sample-efficiency of both model-free and model-based RL methods on locomotion and navigation tasks from DeepMind Control Suite and MiniGrid benchmarks.

State Entropy Maximization with Random Encoders for Efficient Exploration

The experiments show that RE3 significantly improves the sample-efficiency of both model-free and model-based RL methods on locomotion and navigation tasks from DeepMind Control Suite and MiniGrid benchmarks, and allows learning diverse behaviors without extrinsic rewards.

Dynamic Bottleneck for Robust Self-Supervised Exploration

A Dynamic Bottleneck (DB) model is proposed, which attains a dynamics-relevant representation based on the information-bottleneck principle, and which encourages the agent to explore state-action pairs with high information gain, which outperforms several state-of-the-art exploration methods in noisy environments.

An information-theoretic perspective on intrinsic motivation in reinforcement learning: a survey

This work computationally revisit the notions of surprise, novelty and skill learning, and suggests that novelty and surprise can assist the building of a hierarchy of transferable skills that further abstracts the environment and makes the exploration process more robust.

Behavior From the Void: Unsupervised Active Pre-Training

A new unsupervised pre-training method for reinforcement learning called APT, which stands for Active Pre-Training, which learns behaviors and representations by actively searching for novel states in reward-free environments by maximizing a non-parametric entropy computed in an abstract representation space.

Behavior From the Void: Unsupervised Active Pre-Training

A new unsupervised pre-training method for reinforcement learning called APT, which stands for Active Pre-Training, which learns behaviors and representations by actively searching for novel states in reward-free environments by maximizing a non-parametric entropy computed in an abstract representation space.

Nuclear Norm Maximization Based Curiosity-Driven Learning

A novel curiosity leveraging the nuclear norm maximization (NNM), which can quantify the novelty of exploring the environment more accurately while pro-viding high-tolerance to the noise and outliers and suggest that NNM can provide state-of-the-art performance compared with previous curiosity methods.

Cell-Free Latent Go-Explore

It is shown that LGE, although simpler than Go-Explore, is more robust and outperforms all state-of-the-art algorithms in terms of pure exploration on multiple hard-exploration environments.

Maximum Entropy Model-based Reinforcement Learning

This work designs a novel exploration method that takes into account features of the model-based approach and demonstrates through experiments that the method significantly improves the performance of model- based algorithm Dreamer.

Dynamic Memory-based Curiosity: A Bootstrap Approach for Exploration

A novel curiosity for RL, named DyMeCu, which stands for Dynamic Memory-based Curiosity, which consists of a dynamic memory and dual online learners that can better mimic human curiosity with dynamic memory, and the memory module can be dynamically grown based on a bootstrap paradigm with dual learners.

References

SHOWING 1-10 OF 64 REFERENCES

Combined Reinforcement Learning via Abstract Representations

It is shown that the modularity brought by this approach leads to good generalization while being computationally efficient, with planning happening in a smaller latent state space, which opens up new strategies for interpretable AI, exploration and transfer learning.

Unifying Count-Based Exploration and Intrinsic Motivation

This work uses density models to measure uncertainty, and proposes a novel algorithm for deriving a pseudo-count from an arbitrary density model, which enables this technique to generalize count-based exploration algorithms to the non-tabular case.

Model-Based Active Exploration

This paper introduces an efficient active exploration algorithm, Model-Based Active eXploration (MAX), which uses an ensemble of forward models to plan to observe novel events and shows empirically that in semi-random discrete environments where directed exploration is critical to make progress, MAX is at least an order of magnitude more efficient than strong baselines.

Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

This paper considers the challenging Atari games domain, and proposes a new exploration method based on assigning exploration bonuses from a concurrently learned model of the system dynamics that provides the most consistent improvement across a range of games that pose a major challenge for prior methods.

VIME: Variational Information Maximizing Exploration

VIME is introduced, an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics which efficiently handles continuous state and action spaces and can be applied with several different underlying RL algorithms.

Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning

This work proposes to learn a model of the MDP transition probabilities concurrently with the policy, and to form intrinsic rewards that approximate the KL-divergence of the true transition probabilities from the learned model, which results in using surprisal as intrinsic motivation.

Never Give Up: Learning Directed Exploration Strategies

This work constructs an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies, thereby encouraging the agent to repeatedly revisit all states in its environment.

An information-theoretic approach to curiosity-driven reinforcement learning

It is shown that Boltzmann-style exploration, one of the main exploration methods used in reinforcement learning, is optimal from an information-theoretic point of view, in that it optimally trades expected return for the coding cost of the policy.

#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

A simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks, and is found that simple hash functions can achieve surprisingly good results on many challenging tasks.

On Bonus Based Exploration Methods In The Arcade Learning Environment

The results suggest that recent gains in Montezuma's Revenge may be better attributed to architecture change, rather than better exploration schemes; and that the real pace of progress in exploration research for Atari 2600 games may have been obfuscated by good results on a single domain.
...