Corpus ID: 56177829

Information-Directed Exploration for Deep Reinforcement Learning

@article{Nikolov2019InformationDirectedEF,
  title={Information-Directed Exploration for Deep Reinforcement Learning},
  author={Nikolay Nikolov and Johannes Kirschner and Felix Berkenkamp and Andreas Krause},
  journal={ArXiv},
  year={2019},
  volume={abs/1812.07544}
}
Efficient exploration remains a major challenge for reinforcement learning. One reason is that the variability of the returns often depends on the current state and action, and is therefore heteroscedastic. Classical exploration strategies such as upper confidence bound algorithms and Thompson sampling fail to appropriately account for heteroscedasticity, even in the bandit setting. Motivated by recent findings that address this issue in bandits, we propose to use Information-Directed Sampling… Expand
Successor Uncertainties: exploration and uncertainty in temporal difference learning
TLDR
Successor Uncertainties (SU), a cheap and easy to implement RVF algorithm that retains key properties of PSRL, is designed and outperforms its closest RVF competitor, Bootstrapped DQN, on hard tabular exploration benchmarks. Expand
Sequential Generative Exploration Model for Partially Observable Reinforcement Learning
TLDR
This paper proposes a novel reward shaping approach to infer the intrinsic rewards for the agent from a sequential generative model, and formulate the inference procedure for dynamics prediction as a multi-step forward prediction task, where the time abstraction could effectively help to increase the expressiveness of the intrinsic reward signals. Expand
SEQUENCE-LEVEL INTRINSIC EXPLORATION MODEL
  • 2019
Training reinforcement learning policies in partially observable domains with sparse reward signal is an important and open problem for the research community. In this paper, we introduce a newExpand
Estimating Risk and Uncertainty in Deep Reinforcement Learning
TLDR
This work proposes a method for disentangling epistemic and aleatoric uncertainties in deep reinforcement learning that combines elements from distributional reinforcement learning and approximate Bayesian inference techniques with neural networks, allowing for both types of uncertainty on the expected return of a policy. Expand
Learning Efficient and Effective Exploration Policies with Counterfactual Meta Policy
TLDR
This work formalized a feasible metric for measuring the utility of exploration based on counterfactual ideology and proposed an end-to-end algorithm to learn exploration policy by meta-learning. Expand
Principled Exploration via Optimistic Bootstrapping and Backward Induction
TLDR
OB2I constructs a generalpurpose UCB-bonus through non-parametric bootstrap in DRL and propagates future uncertainty in a time-consistent manner through episodic backward update, which exploits the theoretical advantage and empirically improves the sample-efficiency. Expand
Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning
TLDR
A novel model-based reinforcement learning algorithm, called BrIdging Reality and Dream (BIRD), that maximizes the mutual information between imaginary and real trajectories so that the policy improvement learned from imaginary trajectories can be easily generalized to real trajectoryories. Expand
Dueling Posterior Sampling for Preference-Based Reinforcement Learning
TLDR
A Bayesian approach for the credit assignment problem is developed, translating preferences to a posterior distribution over state-action reward models, and an asymptotic Bayesian no-regret rate is proved for DPS with a Bayesian linear regression credit assignment model. Expand
Measuring Progress in Deep Reinforcement Learning Sample Efficiency
TLDR
This work investigates progress in sample efficiency on Atari games and continuous control tasks by comparing the number of samples that a variety of algorithms need to reach a given performance level according to training curves in the corresponding publications. Expand
On the Sample Complexity of Reinforcement Learning with Policy Space Generalization
TLDR
A new notion of eluder dimension for the policy space is proposed, which characterizes the intrinsic complexity of policy learning in an arbitrary Markov Decision Process (MDP), and a near-optimal sample complexity upper bound is proved that only depends linearly on theEluder dimension. Expand
...
1
2
...

References

SHOWING 1-10 OF 52 REFERENCES
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
TLDR
This paper considers the challenging Atari games domain, and proposes a new exploration method based on assigning exploration bonuses from a concurrently learned model of the system dynamics that provides the most consistent improvement across a range of games that pose a major challenge for prior methods. Expand
VIME: Variational Information Maximizing Exploration
TLDR
VIME is introduced, an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics which efficiently handles continuous state and action spaces and can be applied with several different underlying RL algorithms. Expand
Efficient Exploration Through Bayesian Deep Q-Networks
TLDR
Bayesian Deep Q-Network (BDQN), a practical Thompson sampling based Reinforcement Learning (RL) Algorithm, is proposed, which can be trained with fast closed-form updates and its samples can be drawn efficiently through the Gaussian distribution. Expand
Parameter Space Noise for Exploration
TLDR
This work demonstrates that RL with parameter noise learns more efficiently than traditional RL with action space noise and evolutionary strategies individually through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks. Expand
Efficient exploration with Double Uncertain Value Networks
TLDR
Experimental results show that both types of uncertainty may vastly improve learning in domains with a strong exploration challenge. Expand
Generalization and Exploration via Randomized Value Functions
TLDR
The results suggest that randomized value functions offer a promising approach to tackling a critical challenge in reinforcement learning: synthesizing efficient exploration and effective generalization. Expand
Deep Reinforcement Learning with Risk-Seeking Exploration
TLDR
This paper proposes a novel DRL algorithm that encourages risk-seeking behaviour to enhance information acquisition during training and demonstrates the merit of the exploration heuristic by arguing that the risk estimator implicitly contains both parametric uncertainty and inherent uncertainty of the environment. Expand
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling
TLDR
This work benchmarks well-established and recently developed methods for approximate posterior sampling combined with Thompson Sampling over a series of contextual bandit problems and finds that many approaches that have been successful in the supervised learning setting underperformed in the sequential decision-making scenario. Expand
#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
TLDR
A simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks, and is found that simple hash functions can achieve surprisingly good results on many challenging tasks. Expand
Distributional Reinforcement Learning with Quantile Regression
TLDR
A distributional approach to reinforcement learning in which the distribution over returns is modeled explicitly instead of only estimating the mean is built, and a novel distributional reinforcement learning algorithm is presented consistent with the theoretical formulation. Expand
...
1
2
3
4
5
...