Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature
@article{Dong2021ProvableMN, title={Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature}, author={Kefan Dong and Jia-Qi Yang and Tengyu Ma}, journal={ArXiv}, year={2021}, volume={abs/2102.04168} }
This paper studies model-based bandit and reinforcement learning (RL) with nonlinear function approximations. We propose to study convergence to approximate local maxima because we show that global convergence is statistically intractable even for one-layer neural net bandit with a deterministic reward. For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent with Online Model Learner (ViOL), which provably converges to a local maximum with sample complexity… CONTINUE READING
References
SHOWING 1-10 OF 58 REFERENCES
Algorithmic Framework for Model-based Reinforcement Learning with Theoretical Guarantees
- Computer Science, Mathematics
- ICLR
- 2019
- 73
- PDF
Provably Efficient Reinforcement Learning with General Value Function Approximation
- Computer Science
- ArXiv
- 2020
- 18
- Highly Influential
Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches
- Computer Science
- COLT
- 2019
- 53
- PDF
Provably Efficient Reinforcement Learning with Linear Function Approximation
- Computer Science, Mathematics
- COLT
- 2020
- 102
- PDF
Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound
- Computer Science, Mathematics
- ICML
- 2020
- 74
- PDF
Optimism in Reinforcement Learning with Generalized Linear Function Approximation
- Computer Science, Mathematics
- ArXiv
- 2019
- 28
- PDF
Learning Near Optimal Policies with Low Inherent Bellman Error
- Computer Science, Mathematics
- ICML
- 2020
- 32
- PDF
Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles
- Computer Science, Mathematics
- AISTATS
- 2020
- 21
- PDF