Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping
- Dongruo Zhou, Jiafan He, Quanquan Gu
- Computer ScienceInternational Conference on Machine Learning
- 23 June 2020
This paper proposes a novel algorithm which makes use of the feature mapping and obtains a first polynomial regret bound, and suggests that the proposed reinforcement learning algorithm is near-optimal up to a $(1-\gamma)^{-0.5}$ factor.
Logarithmic Regret for Reinforcement Learning with Linear Function Approximation
- Jiafan He, Dongruo Zhou, Quanquan Gu
- Computer ScienceInternational Conference on Machine Learning
- 23 November 2020
It is shown that logarithmic regret is attainable under two recently proposed linear MDP assumptions provided that there exists a positive sub-optimality gap for the optimal action-value function.
Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs
- Jiafan He, Dongruo Zhou, Quanquan Gu
- Computer ScienceNeural Information Processing Systems
- 1 October 2020
The modelbased algorithm named UCBVI-γ, which is based on the optimism in the face of uncertainty principle and the Bernstein-type bonus, achieves an Õ (√ SAT/(1− γ) ) regret, which suggests that UCB VI-γ is nearly minimax optimal for discounted MDPs.
Learning Stochastic Shortest Path with Linear Function Approximation
- Yifei Min, Jiafan He, Tianhao Wang, Quanquan Gu
- Computer ScienceInternational Conference on Machine Learning
- 25 October 2021
A novel algorithm with Hoeffding-type confidence sets for learning the linear mixture SSP, which provably achieves an near-optimal regret guarantee and proves a lower bound of Ω( dB (cid:63) √ K ) .
Provably Efficient Representation Learning in Low-rank Markov Decision Processes
- Weitong Zhang, Jiafan He, Dongruo Zhou, Amy Zhang, Quanquan Gu
- Computer ScienceArXiv
- 22 June 2021
A provably efficient algorithm called ReLEX is proposed that can simultaneously learn the representation and perform exploration and will be strictly better in terms of sample efficiency if the function class of representations enjoys a certain mild “coverage” property over the whole state-action space.
Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes
- Jiafan He, Heyang Zhao, Dongruo Zhou, Quanquan Gu
- Mathematics, Computer ScienceArXiv
- 12 December 2022
This work proposes the first computationally efficient algorithm that achieves the nearly minimax optimal regret and uses a rare-switching policy to update the value function estimator to control the complexity of the estimated value function class.
Achieving a Fairer Future by Changing the Past
- Jiafan He, Ariel D. Procaccia, Alexandros Psomas, David Zeng
- EconomicsInternational Joint Conference on Artificial…
- 1 August 2019
It is shown that algorithms that are informed about the values of future items can get by without any adjustments, whereas uninformed algorithms require Θ(T ) adjustments, and design an uninformed algorithm that requires only O(T ).
Locally Differentially Private Reinforcement Learning for Linear Mixture Markov Decision Processes
- Chonghua Liao, Jiafan He, Quanquan Gu
- Computer ScienceArXiv
- 19 October 2021
A novel pε, δq-LDP algorithm for learning a class of Markov decision processes (MDPs) dubbed linear mixture MDPs, and obtains local differential privacy guarantees.
Robust precision attitude tracking of an uncertain rigid spacecraft based on regulation theory
- Jiafan He, A. Sheng, Dabo Xu, Zhiyong Chen, Dan Wang
- MathematicsInternational Journal of Robust and Nonlinear…
- 2 May 2019
This paper explores regulation theory for the design of robust precision attitude tracking of an uncertain rigid spacecraft with external disturbances. Focusing on the attitude system in terms of…
Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs
- Jiafan He, Dongruo Zhou, Quanquan Gu
- Computer ScienceInternational Conference on Artificial…
- 17 February 2021
This paper proposes an optimistic policy optimization algorithm POWERS and shows that it can achieve regret, and proves a matching lower bound of (cid:101) Ω( dH √ T ) up to logarithmic factors.
...
...