# Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning

@inproceedings{Tan2020ParameterizedIV,
title={Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning},
author={Tian Tan and Zhihan Xiong and Vikranth Reddy Dwaracherla},
booktitle={AAAI},
year={2020}
}
• Published in AAAI 23 December 2019
• Computer Science
It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning. Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions. However, it still requires a huge amount of computational resources for complex problems. In this paper, we present an alternative, computationally efficient way to induce exploration using index sampling. We use an indexed value function to…
2 Citations

## Figures from this paper

### Multi-Agent Bootstrapped Deep Q-Network for Large-Scale Traffic Signal Control

• Computer Science
2020 IEEE Conference on Control Technology and Applications (CCTA)
• 2020
This paper adopts the bootstrapped Deep Q-Network (DQN) algorithm to induce exploration via an ensemble of behavior policies, and it outperforms the vanilla DQN in both efficiency and robustness on a handcrafted asymmetric isolated intersection.

## References

SHOWING 1-10 OF 30 REFERENCES

### Randomized Value Functions via Multiplicative Normalizing Flows

• Computer Science
UAI
• 2019
This work leverage recent advances in variational Bayesian neural networks and combine these with traditional Deep Q-Networks (DQN) and Deep Deterministic Policy Gradient (DDPG) to achieve randomized value functions for high-dimensional domains to perform approximate Thompson sampling in a computationally efficient manner via stochastic gradient methods.

### Randomized Prior Functions for Deep Reinforcement Learning

• Computer Science
NeurIPS
• 2018
It is shown that this approach is efficient with linear representations, provides simple illustrations of its efficacy with nonlinear representations and scales to large-scale problems far better than previous attempts.

### Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

• Computer Science
NIPS
• 2012
This paper introduces a tractable, sample-based method for approximate Bayes-optimal planning which exploits Monte-Carlo tree search and shows it working in an infinite state space domain which is qualitatively out of reach of almost all previous work in Bayesian exploration.

### The Uncertainty Bellman Equation and Exploration

• Computer Science
ICML
• 2018
It is proved that the unique fixed point of the UBE yields an upper bound on the variance of the posterior distribution of the Q-values induced by any policy, which can be much tighter than traditional count-based bonuses that compound standard deviation rather than variance.

### Noisy Networks for Exploration

• Computer Science
ICLR
• 2018
It is found that replacing the conventional exploration heuristics for A3C, DQN and dueling agents with NoisyNet yields substantially higher scores for a wide range of Atari games, in some cases advancing the agent from sub to super-human performance.

### Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

• Computer Science
ICML
• 2017
An Bayesian expected regret bound for PSRL in finite-horizon episodic Markov decision processes is established, which improves upon the best previous bound of $\tilde{O}(H S \sqrt{AT})$ for any reinforcement learning algorithm.

### Deep Exploration via Randomized Value Functions

• Computer Science
J. Mach. Learn. Res.
• 2019
A regret bound that establishes statistical efficiency with a tabular representation is proved, which offers an elegant means for synthesizing statistically and computationally efficient exploration with common practical approaches to value function learning.

### Deep Exploration via Bootstrapped DQN

• Computer Science
NIPS
• 2016
Efficient exploration in complex environments remains a major challenge for reinforcement learning. We propose bootstrapped DQN, a simple algorithm that explores in a computationally and

### Bayesian Reinforcement Learning: A Survey

• Computer Science
Found. Trends Mach. Learn.
• 2015
An in-depth review of the role of Bayesian methods for the reinforcement learning (RL) paradigm, and a comprehensive survey on Bayesian RL algorithms and their theoretical and empirical properties.

### Behaviour Suite for Reinforcement Learning

• Computer Science
ICLR
• 2020
This paper introduces the Behaviour Suite for Reinforcement Learning, or bsuite for short. bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement