# Provably adaptive reinforcement learning in metric spaces

@article{Cao2020ProvablyAR, title={Provably adaptive reinforcement learning in metric spaces}, author={Tongyi Cao and A. Krishnamurthy}, journal={ArXiv}, year={2020}, volume={abs/2006.10875} }

We study reinforcement learning in continuous state and action spaces endowed with a metric. We provide a refined analysis of the algorithm of Sinclair, Banerjee, and Yu (2019) and show that its regret scales with the \emph{zooming dimension} of the instance. This parameter, which originates in the bandit literature, captures the size of the subsets of near optimal actions and is always smaller than the covering dimension used in previous analyses. As such, our results are the first provably… Expand

#### Figures and Topics from this paper

#### 2 Citations

Regret Bounds for Adaptive Nonlinear Control

- Computer Science, Engineering
- L4DC
- 2021

The first finite-time regret bounds for adaptive nonlinear control with matched uncertainty in the stochastic setting are proved, showing that the regret suffered by certainty equivalence adaptive control, compared to an oracle controller with perfect knowledge of the unmodeled disturbances, is upper bounded by $\widetilde{O}(\sqrt{T})$ in expectation. Expand

Adaptive Discretization for Model-Based Reinforcement Learning

- Computer Science, Mathematics
- NeurIPS
- 2020

This work introduces the technique of adaptive discretization to design efficient model-based episodic reinforcement learning algorithms in large (potentially continuous) state-action spaces and provides worst-case regret bounds for this algorithm, which are competitive compared to the state-of-the-art model- based algorithms. Expand

#### References

SHOWING 1-10 OF 21 REFERENCES

Adaptive Discretization for Episodic Reinforcement Learning in Metric Spaces

- Computer Science, Mathematics
- Proc. ACM Meas. Anal. Comput. Syst.
- 2019

This work presents an efficient algorithm for model-free episodic reinforcement learning on large (potentially continuous) state-action spaces, based on a novel Q-learning policy with adaptive data-driven discretization, which recovers the regret guarantees of prior algorithms for continuous state- action spaces. Expand

Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces

- Computer Science, Mathematics
- ArXiv
- 2020

This paper proposes ZoomRL, an online algorithm that leverages ideas from continuous bandits to learn an adaptive discretization of the joint space by zooming in more promising and frequently visited regions while carefully balancing the exploitation-exploration trade-off. Expand

Learning to Control in Metric Space with Optimal Regret

- Mathematics, Computer Science
- 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
- 2019

This work provides a surprisingly simple upper-confidence reinforcement learning algorithm that uses a function approximation oracle to estimate optimistic Q functions from experiences and establishes a near-matching regret lower bound. Expand

Exploration in Metric State Spaces

- Mathematics, Computer Science
- ICML
- 2003

We present metric-E3, a provably near-optimal algorithm for reinforcement learning in Markov decision processes in which there is a natural metric on the state space that allows the construction of… Expand

Efficient Model-free Reinforcement Learning in Metric Spaces

- Computer Science, Mathematics
- ArXiv
- 2019

This work presents an efficient model-free Q-learning based algorithm in MDPs with a natural metric on the state-action space that does not require access to a black-box planning oracle. Expand

Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

- Computer Science, Mathematics
- NIPS
- 2017

A new framework for theoretically measuring the performance of reinforcement learning algorithms called Uniform-PAC, which is a strengthening of the classical Probably Approximately Correct (PAC) framework, and simultaneously achieves optimal regret and PAC guarantees except for a factor of the horizon. Expand

Is Q-learning Provably Efficient?

- Computer Science, Mathematics
- NeurIPS
- 2018

Model-free reinforcement learning (RL) algorithms, such as Q-learning, directly parameterize and update value functions or policies without explicitly modeling the environment. They are typically… Expand

Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting

- Computer Science, Mathematics
- COLT
- 2019

Two qualitatively different regret bounds are obtained: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Expand

Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

- Computer Science, Mathematics
- NIPS
- 2012

The proposed algorithm combines state aggregation with the use of upper confidence bounds for implementing optimism in the face of uncertainty and derives sublinear regret bounds for undiscounted reinforcement learning in continuous state space. Expand

Adaptive aggregation for reinforcement learning in average reward Markov decision processes

- Mathematics, Computer Science
- Ann. Oper. Res.
- 2013

An algorithm which aggregates online when learning to behave optimally in an average reward Markov decision process and derives bounds on the regret this algorithm suffers with respect to an optimal policy is presented. Expand