Corpus ID: 53102049

Model-Based Active Exploration

@article{Shyam2019ModelBasedAE,
  title={Model-Based Active Exploration},
  author={Pranav Shyam and Wojciech Jaśkowski and Faustino J. Gomez},
  journal={ArXiv},
  year={2019},
  volume={abs/1810.12162}
}
Efficient exploration is an unsolved problem in Reinforcement Learning which is usually addressed by reactively rewarding the agent for fortuitously encountering novel situations. This paper introduces an efficient active exploration algorithm, Model-Based Active eXploration (MAX), which uses an ensemble of forward models to plan to observe novel events. This is carried out by optimizing agent behaviour with respect to a measure of novelty derived from the Bayesian perspective of exploration… Expand
Receding Horizon Curiosity
TLDR
An effective trajectory-optimization-based approximate solution of this otherwise intractable problem that models optimal exploration in an unknown Markov decision process (MDP) by interleaving episodic exploration with Bayesian nonlinear system identification. Expand
Reinforcement Learning through Active Inference
The central tenet of reinforcement learning (RL) is that agents seek to maximize the sum of cumulative rewards. In contrast, active inference, an emerging framework within cognitive and computationalExpand
ACTIVE INFERENCE
The central tenet of reinforcement learning (RL) is that agents seek to maximize the sum of cumulative rewards. In contrast, active inference, an emerging framework within cognitive and computationalExpand
Sample Efficient Reinforcement Learning via Model-Ensemble Exploration and Exploitation
TLDR
MEEE is presented, a model-ensemble method that consists of optimistic exploration and weighted exploitation that outperforms other model-free and model-based state-of-the-art methods, especially in sample complexity. Expand
Explicit Explore-Exploit Algorithms in Continuous State Spaces
TLDR
It is shown that under realizability and optimal planning assumptions, the algorithm provably finds a near-optimal policy with a number of samples that is polynomial in a structural complexity measure which is shown to be low in several natural settings. Expand
MADE: Exploration via Maximizing Deviation from Explored Regions
TLDR
This work proposes a new exploration approach via maximizing the deviation of the occupancy of the next policy from the explored regions, giving rise to a new intrinsic reward that adjusts existing bonuses. Expand
Scaling Active Inference
TLDR
This work presents a working implementation of active inference that applies to high-dimensional tasks, with proof-of-principle results demonstrating efficient exploration and an order of magnitude increase in sample efficiency over strong model-free baselines. Expand
SAMBA: Safe Model-Based & Active Reinforcement Learning
In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCOExpand
Planning to Explore via Self-Supervised World Models
TLDR
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods, and in fact, almost matches the performances oracle which has access to rewards. Expand
Self-Supervised Exploration via Disagreement
TLDR
This paper proposes a formulation for exploration inspired by the work in active learning literature and trains an ensemble of dynamics models and incentivizes the agent to explore such that the disagreement of those ensembles is maximized, which results in a sample-efficient exploration. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 49 REFERENCES
Efficient Exploration in Reinforcement Learning
  • J. Langford
  • Computer Science
  • Encyclopedia of Machine Learning and Data Mining
  • 2017
TLDR
Exploration is a key aspect of reinforcement learning which is missing from standard supervised learning settings and minimizing the number of information gathering actions helps optimize the standard goal in reinforcement learning. Expand
VIME: Variational Information Maximizing Exploration
TLDR
VIME is introduced, an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics which efficiently handles continuous state and action spaces and can be applied with several different underlying RL algorithms. Expand
Self-Supervised Exploration via Disagreement
TLDR
This paper proposes a formulation for exploration inspired by the work in active learning literature and trains an ensemble of dynamics models and incentivizes the agent to explore such that the disagreement of those ensembles is maximized, which results in a sample-efficient exploration. Expand
An information-theoretic approach to curiosity-driven reinforcement learning
TLDR
It is shown that Boltzmann-style exploration, one of the main exploration methods used in reinforcement learning, is optimal from an information-theoretic point of view, in that it optimally trades expected return for the coding cost of the policy. Expand
Unifying Count-Based Exploration and Intrinsic Motivation
TLDR
This work uses density models to measure uncertainty, and proposes a novel algorithm for deriving a pseudo-count from an arbitrary density model, which enables this technique to generalize count-based exploration algorithms to the non-tabular case. Expand
Explorations in efficient reinforcement learning
TLDR
Reinforcement learning methods are described which can solve sequential decision making problems by learning from trial and error and different categories of problems are described and new methods for solving them are introduced. Expand
Intrinsically motivated model learning for developing curious robots
TLDR
Experiments show that combining the agent's intrinsic rewards with external task rewards enables the agent to learn faster than using external rewards alone, and the applicability of this approach to learning on robots is presented. Expand
Deep Exploration via Bootstrapped DQN
Efficient exploration in complex environments remains a major challenge for reinforcement learning. We propose bootstrapped DQN, a simple algorithm that explores in a computationally andExpand
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
TLDR
This paper proposes a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation, which matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples. Expand
PILCO: A Model-Based and Data-Efficient Approach to Policy Search
TLDR
PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way by learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning. Expand
...
1
2
3
4
5
...