Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning

  title={Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning},
  author={Tian Tan and Zhihan Xiong and Vikranth Reddy Dwaracherla},
It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning. Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions. However, it still requires a huge amount of computational resources for complex problems. In this paper, we present an alternative, computationally efficient way to induce exploration using index sampling. We use an indexed value function to… 

Figures from this paper

Multi-Agent Bootstrapped Deep Q-Network for Large-Scale Traffic Signal Control

This paper adopts the bootstrapped Deep Q-Network (DQN) algorithm to induce exploration via an ensemble of behavior policies, and it outperforms the vanilla DQN in both efficiency and robustness on a handcrafted asymmetric isolated intersection.



Randomized Value Functions via Multiplicative Normalizing Flows

This work leverage recent advances in variational Bayesian neural networks and combine these with traditional Deep Q-Networks (DQN) and Deep Deterministic Policy Gradient (DDPG) to achieve randomized value functions for high-dimensional domains to perform approximate Thompson sampling in a computationally efficient manner via stochastic gradient methods.

Randomized Prior Functions for Deep Reinforcement Learning

It is shown that this approach is efficient with linear representations, provides simple illustrations of its efficacy with nonlinear representations and scales to large-scale problems far better than previous attempts.

Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

This paper introduces a tractable, sample-based method for approximate Bayes-optimal planning which exploits Monte-Carlo tree search and shows it working in an infinite state space domain which is qualitatively out of reach of almost all previous work in Bayesian exploration.

The Uncertainty Bellman Equation and Exploration

It is proved that the unique fixed point of the UBE yields an upper bound on the variance of the posterior distribution of the Q-values induced by any policy, which can be much tighter than traditional count-based bonuses that compound standard deviation rather than variance.

Noisy Networks for Exploration

It is found that replacing the conventional exploration heuristics for A3C, DQN and dueling agents with NoisyNet yields substantially higher scores for a wide range of Atari games, in some cases advancing the agent from sub to super-human performance.

Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

An Bayesian expected regret bound for PSRL in finite-horizon episodic Markov decision processes is established, which improves upon the best previous bound of $\tilde{O}(H S \sqrt{AT})$ for any reinforcement learning algorithm.

Deep Exploration via Randomized Value Functions

A regret bound that establishes statistical efficiency with a tabular representation is proved, which offers an elegant means for synthesizing statistically and computationally efficient exploration with common practical approaches to value function learning.

Deep Exploration via Bootstrapped DQN

Efficient exploration in complex environments remains a major challenge for reinforcement learning. We propose bootstrapped DQN, a simple algorithm that explores in a computationally and

Bayesian Reinforcement Learning: A Survey

An in-depth review of the role of Bayesian methods for the reinforcement learning (RL) paradigm, and a comprehensive survey on Bayesian RL algorithms and their theoretical and empirical properties.

Behaviour Suite for Reinforcement Learning

This paper introduces the Behaviour Suite for Reinforcement Learning, or bsuite for short. bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement