Corpus ID: 202537386

√n-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank

@article{Dong2020nRegretFL,
  title={√n-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank},
  author={Kefan Dong and Jian Peng and Yining Wang and Y. Zhou},
  journal={ArXiv},
  year={2020},
  volume={abs/1909.02506}
}
In this paper, we consider the problem of online learning of Markov decision processes (MDPs) with very large state spaces. Under the assumptions of realizable function approximation and low Bellman ranks, we develop an online learning algorithm that learns the optimal value function while at the same time achieving very low cumulative regret during the learning process. Our learning algorithm, Adaptive Value-function Elimination (AVE), is inspired by the policy elimination algorithm proposed… Expand
Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation
Nonstationary Reinforcement Learning with Linear Function Approximation
Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms
Bilinear Classes: A Structural Framework for Provable Generalization in RL
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces
Online Learning for Unknown Partially Observable MDPs
Provably Efficient Cooperative Multi-Agent Reinforcement Learning with Function Approximation
...
1
2
...

References

SHOWING 1-10 OF 55 REFERENCES
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
Finite-Time Bounds for Fitted Value Iteration
Efficient Optimal Learning for Contextual Bandits
Regularized Policy Iteration with Nonparametric Function Spaces
Is Q-learning Provably Efficient?
Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization
...
1
2
3
4
5
...