• Corpus ID: 53102049

Model-Based Active Exploration

@article{Shyam2019ModelBasedAE,
  title={Model-Based Active Exploration},
  author={Pranav Shyam and Wojciech Jaśkowski and Faustino J. Gomez},
  journal={ArXiv},
  year={2019},
  volume={abs/1810.12162}
}
Efficient exploration is an unsolved problem in Reinforcement Learning which is usually addressed by reactively rewarding the agent for fortuitously encountering novel situations. This paper introduces an efficient active exploration algorithm, Model-Based Active eXploration (MAX), which uses an ensemble of forward models to plan to observe novel events. This is carried out by optimizing agent behaviour with respect to a measure of novelty derived from the Bayesian perspective of exploration… 

Figures and Tables from this paper

Receding Horizon Curiosity

An effective trajectory-optimization-based approximate solution of this otherwise intractable problem that models optimal exploration in an unknown Markov decision process (MDP) by interleaving episodic exploration with Bayesian nonlinear system identification.

Reinforcement Learning through Active Inference

The central tenet of reinforcement learning (RL) is that agents seek to maximize the sum of cumulative rewards. In contrast, active inference, an emerging framework within cognitive and computational

ACTIVE INFERENCE

The central tenet of reinforcement learning (RL) is that agents seek to maximize the sum of cumulative rewards. In contrast, active inference, an emerging framework within cognitive and computational

Sample Efficient Reinforcement Learning via Model-Ensemble Exploration and Exploitation

MEEE is presented, a model-ensemble method that consists of optimistic exploration and weighted exploitation that outperforms other model-free and model-based state-of-the-art methods, especially in sample complexity.

Explicit Explore-Exploit Algorithms in Continuous State Spaces

It is shown that under realizability and optimal planning assumptions, the algorithm provably finds a near-optimal policy with a number of samples that is polynomial in a structural complexity measure which is shown to be low in several natural settings.

SAMBA: safe model-based & active reinforcement learning

In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO

Scaling Active Inference

This work presents a working implementation of active inference that applies to high-dimensional tasks, with proof-of-principle results demonstrating efficient exploration and an order of magnitude increase in sample efficiency over strong model-free baselines.

Planning to Explore via Self-Supervised World Models

Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods, and in fact, almost matches the performances oracle which has access to rewards.

Self-Supervised Exploration via Disagreement

This paper proposes a formulation for exploration inspired by the work in active learning literature and trains an ensemble of dynamics models and incentivizes the agent to explore such that the disagreement of those ensembles is maximized, which results in a sample-efficient exploration.

Generative Planning for Temporally Coordinated Exploration in Reinforcement Learning

This work presents Generative Planning method (GPM), which can generate actions not only for the current step, but also for a number of future steps (thus termed as generative planning), and demonstrates its effectiveness compared with several baseline methods.
...

References

SHOWING 1-10 OF 46 REFERENCES

Efficient Exploration in Reinforcement Learning

  • J. Langford
  • Computer Science
    Encyclopedia of Machine Learning
  • 2010
Exploration is a key aspect of reinforcement learning which is missing from standard supervised learning settings and minimizing the number of information gathering actions helps optimize the standard goal in reinforcement learning.

VIME: Variational Information Maximizing Exploration

VIME is introduced, an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics which efficiently handles continuous state and action spaces and can be applied with several different underlying RL algorithms.

Self-Supervised Exploration via Disagreement

This paper proposes a formulation for exploration inspired by the work in active learning literature and trains an ensemble of dynamics models and incentivizes the agent to explore such that the disagreement of those ensembles is maximized, which results in a sample-efficient exploration.

An information-theoretic approach to curiosity-driven reinforcement learning

It is shown that Boltzmann-style exploration, one of the main exploration methods used in reinforcement learning, is optimal from an information-theoretic point of view, in that it optimally trades expected return for the coding cost of the policy.

Unifying Count-Based Exploration and Intrinsic Motivation

This work uses density models to measure uncertainty, and proposes a novel algorithm for deriving a pseudo-count from an arbitrary density model, which enables this technique to generalize count-based exploration algorithms to the non-tabular case.

Explorations in efficient reinforcement learning

Reinforcement learning methods are described which can solve sequential decision making problems by learning from trial and error and different categories of problems are described and new methods for solving them are introduced.

Intrinsically motivated model learning for developing curious robots

Deep Exploration via Bootstrapped DQN

Efficient exploration in complex environments remains a major challenge for reinforcement learning. We propose bootstrapped DQN, a simple algorithm that explores in a computationally and

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

This paper proposes a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation, which matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples.

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way by learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning.