# Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

@inproceedings{Hu2022NearlyMO, title={Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation}, author={Pihe Hu and Yu Chen and Longbo Huang}, booktitle={International Conference on Machine Learning}, year={2022} }

We study reinforcement learning with linear function approximation where the transition probability and reward functions are linear with respect to a feature mapping ϕ p s, a q . Specifically, we consider the episodic inhomogeneous linear Markov Decision Process (MDP), and propose a novel computation-efficient algorithm, LSVI-UCB ` , which achieves an r O p Hd ? T q regret bound where H is the episode length, d is the feature dimension, and T is the number of steps. LSVI-UCB ` builds on…

## References

SHOWING 1-10 OF 34 REFERENCES

### Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes

- Computer ScienceCOLT
- 2021

A new Bernstein-type concentration inequality for self-normalized martingales for linear bandit problems with bounded noise and a new, computationally efficient algorithm with linear function approximation named UCRL-VTR for the aforementioned linear mixture MDPs in the episodic undiscounted setting are proposed.

### Provably Efficient Reinforcement Learning with Linear Function Approximation

- Computer ScienceCOLT
- 2020

This paper proves that an optimistic modification of Least-Squares Value Iteration (LSVI) achieves regret, where d is the ambient dimension of feature space, H is the length of each episode, and T is the total number of steps, and is independent of the number of states and actions.

### Bandit Algorithms

- Mathematics
- 2020

sets of environments and policies respectively and ` : E ×Π→ [0, 1] a bounded loss function. Given a policy π let `(π) = (`(ν1, π), . . . , `(νN , π)) be the loss vector resulting from policy π.…

### Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes

- Mathematics, Computer ScienceArXiv
- 2022

This work proposes the first computationally computationally efficient algorithm that achieves the nearly minimax optimal regret for episodic time-inhomogeneous linear Markov decision processes (linear MDPs).

### Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension

- Computer Science, MathematicsNeurIPS
- 2020

This paper establishes a provably efficient RL algorithm with general value function approximation that achieves a regret bound of $\widetilde{O}(\mathrm{poly}(dH)\sqrt{T})$ and provides a framework to justify the effectiveness of algorithms used in practice.

### Model-Based Reinforcement Learning with Value-Targeted Regression

- Computer ScienceL4DC
- 2020

This paper proposes a model based RL algorithm that is based on optimism principle, and derives a bound on the regret, which is independent of the total number of states or actions, and is close to a lower bound $\Omega(\sqrt{HdT})$.

### Improved Optimistic Algorithms for Logistic Bandits

- Computer ScienceICML
- 2020

A new optimistic algorithm is proposed based on a finer examination of the non-linearities of the reward function that enjoys a $\tilde{\mathcal{O}}(\sqrt{T})$ regret with no dependency in $\kappa$, but for a second order term.

### Provably Efficient Exploration in Policy Optimization

- Computer ScienceICML
- 2020

This paper proves that, in the problem of episodic Markov decision process with linear function approximation, unknown transition, and adversarial reward with full-information feedback, OPPO achieves regret.

### Optimism in Reinforcement Learning with Generalized Linear Function Approximation

- Computer ScienceICLR
- 2021

This work designs a new provably efficient algorithm for episodic reinforcement learning with generalized linear function approximation that enjoys a regret bound of $\tilde{O}(\sqrt{d^3 T})$ where d is the dimensionality of the state-action features and T is the number of episodes.