• Corpus ID: 210839465

Reinforcement Learning with Probabilistically Complete Exploration

  title={Reinforcement Learning with Probabilistically Complete Exploration},
  author={Philippe Morere and Gilad Francis and Tom Blau and Fabio Ramos},
Balancing exploration and exploitation remains a key challenge in reinforcement learning (RL). State-of-the-art RL algorithms suffer from high sample complexity, particularly in the sparse reward case, where they can do no better than to explore in all directions until the first positive rewards are found. To mitigate this, we propose Rapidly Randomly-exploring Reinforcement Learning (R3L). We formulate exploration as a search problem and leverage widely-used planning algorithms such as Rapidly… 

Figures and Tables from this paper

PBCS : Efficient Exploration and Exploitation Using a Synergy between Reinforcement Learning and Motion Planning
This paper proposes a new algorithm that combines motion planning and reinforcement learning to solve hard exploration environments, and shows that this method outperforms state-of-the-art RL algorithms in 2D maze environments of various sizes, and is able to improve on the trajectory obtained by the motion planning phase.
Artificial Neural Networks and Machine Learning – ICANN 2020: 29th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 15–18, 2020, Proceedings, Part II
A Fine-grained Channel Pruning method that allows any channels to be pruned independently, to avoid the misalignment problem between convolution and skip connection, and can achieve better performance than other state-of-the-art methods in terms of parameter and computation cost.


Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
This paper considers the challenging Atari games domain, and proposes a new exploration method based on assigning exploration bonuses from a concurrently learned model of the system dynamics that provides the most consistent improvement across a range of games that pose a major challenge for prior methods.
Overcoming Exploration in Reinforcement Learning with Demonstrations
This work uses demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm.
VIME: Variational Information Maximizing Exploration
VIME is introduced, an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics which efficiently handles continuous state and action spaces and can be applied with several different underlying RL algorithms.
Bayesian RL for Goal-Only Rewards
The proposed algorithm (EMU-Q) achieves data-efficient exploration, and balances exploration and exploitation explicitly at a policy level granting users more control over the learning process.
Go-Explore: a New Approach for Hard-Exploration Problems
A new algorithm called Go-Explore, which exploits the following principles to remember previously visited states, solve simulated environments through any available means, and robustify via imitation learning, which results in a dramatic performance improvement on hard-exploration problems.
Learning to Plan via Neural Exploration-Exploitation Trees
A meta path planning algorithm which can utilize prior experience to drastically reduce the sample requirement for solving new path planning problems, and which can complete the planning tasks with very small search trees and significantly outperforms previous state-of-the-arts on several benchmark problems.
Generalization and Exploration via Randomized Value Functions
The results suggest that randomized value functions offer a promising approach to tackling a critical challenge in reinforcement learning: synthesizing efficient exploration and effective generalization.
Using trajectory data to improve bayesian optimization for reinforcement learning
This work shows how to more effectively apply Bayesian Optimization to RL by exploiting the sequential trajectory information generated by RL agents, and shows that the model-based approach developed can recover from model inaccuracies when good transition and reward models cannot be learned.
Learning to Plan in High Dimensions via Neural Exploration-Exploitation Trees
This work proposes a meta path planning algorithm named NEXT, which exploits a novel neural architecture which can learn promising search directions from problem structures and is integrated into a UCB-type algorithm to achieve an online balance between exploration and exploitation when solving a new problem.
DORA The Explorer: Directed Outreaching Reinforcement Action-Selection
This work proposes $E-values, a generalization of counters that can be used to evaluate the propagating exploratory value over state-action trajectories, and shows that using $E$-values improves learning and performance over traditional counters.