Corpus ID: 7559418

Minimax Regret Bounds for Reinforcement Learning

@inproceedings{Azar2017MinimaxRB,
  title={Minimax Regret Bounds for Reinforcement Learning},
  author={M. G. Azar and Ian Osband and R. Munos},
  booktitle={ICML},
  year={2017}
}
We consider the problem of provably optimal exploration in reinforcement learning for finite horizon MDPs. We show that an optimistic modification to value iteration achieves a regret bound of $\tilde{O}( \sqrt{HSAT} + H^2S^2A+H\sqrt{T})$ where $H$ is the time horizon, $S$ the number of states, $A$ the number of actions and $T$ the number of time-steps. This result improves over the best previous known bound $\tilde{O}(HS \sqrt{AT})$ achieved by the UCRL2 algorithm of Jaksch et al., 2010. The… Expand
254 Citations
Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function
  • 26
  • PDF
Minimax Optimal Reinforcement Learning for Discounted MDPs
  • 1
  • Highly Influenced
  • PDF
Optimistic posterior sampling for reinforcement learning: worst-case regret bounds
  • 111
  • PDF
Q-learning with Logarithmic Regret
  • 6
  • PDF
Regret Bounds for Discounted MDPs
  • 4
  • Highly Influenced
  • PDF
Variance Reduction Methods for Sublinear Reinforcement Learning
  • 28
  • PDF
Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs
  • 30
  • PDF
Naive Exploration is Optimal for Online LQR
  • 37
  • PDF
Learning Near Optimal Policies with Low Inherent Bellman Error
  • 37
  • PDF
Variance-reduced Q-learning is minimax optimal
  • 22
  • PDF
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 45 REFERENCES
Why is Posterior Sampling Better than Optimism for Reinforcement Learning?
  • 117
  • PDF
Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning
  • 133
  • PDF
Near-optimal Regret Bounds for Reinforcement Learning
  • 733
  • Highly Influential
  • PDF
Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model
  • 132
  • PDF
R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning
  • 1,069
  • PDF
On Lower Bounds for Regret in Reinforcement Learning
  • 62
  • PDF
Near-Optimal Reinforcement Learning in Polynomial Time
  • 899
  • PDF
UBEV - A More Practical Algorithm for Episodic RL with Near-Optimal PAC and Regret Guarantees
  • 3
  • PDF
PAC Bounds for Discounted MDPs
  • 83
  • PDF
...
1
2
3
4
5
...