• Computer Science, Mathematics
  • Published in ICML 2017

Minimax Regret Bounds for Reinforcement Learning

@inproceedings{Azar2017MinimaxRB,
  title={Minimax Regret Bounds for Reinforcement Learning},
  author={Mohammad Gheshlaghi Azar and Ian Osband and R{\'e}mi Munos},
  booktitle={ICML},
  year={2017}
}
We consider the problem of provably optimal exploration in reinforcement learning for finite horizon MDPs. We show that an optimistic modification to value iteration achieves a regret bound of $\tilde{O}( \sqrt{HSAT} + H^2S^2A+H\sqrt{T})$ where $H$ is the time horizon, $S$ the number of states, $A$ the number of actions and $T$ the number of time-steps. This result improves over the best previous known bound $\tilde{O}(HS \sqrt{AT})$ achieved by the UCRL2 algorithm of Jaksch et al., 2010. The… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-10 OF 70 CITATIONS

Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs

VIEW 15 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

A Tractable Algorithm for Finite-Horizon Continuous Reinforcement Learning

  • 2019 2nd International Conference on Intelligent Autonomous Systems (ICoIAS)
  • 2019
VIEW 7 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Efficient Policy Learning for Non-Stationary MDPs under Adversarial Manipulation

VIEW 10 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Is Q-learning Provably Efficient ?

VIEW 6 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Is Q-learning Provably Efficient ?

VIEW 6 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Is Q-learning Provably Efficient?

VIEW 6 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Minimal Exploration in Episodic Reinforcement Learning

VIEW 4 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

FILTER CITATIONS BY YEAR

2016
2019

CITATION STATISTICS

  • 19 Highly Influenced Citations

  • Averaged 23 Citations per year from 2017 through 2019

  • 56% Increase in citations per year in 2019 over 2018