Corpus ID: 207870436

Frequentist Regret Bounds for Randomized Least-Squares Value Iteration

  title={Frequentist Regret Bounds for Randomized Least-Squares Value Iteration},
  author={Andrea Zanette and David Brandfonbrener and M. Pirotta and A. Lazaric},
  • Andrea Zanette, David Brandfonbrener, +1 author A. Lazaric
  • Published 2020
  • Mathematics, Computer Science
  • ArXiv
  • We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning (RL). When the state space is large or continuous, traditional tabular approaches are unfeasible and some form of function approximation is mandatory. In this paper, we introduce an optimistically-initialized variant of the popular randomized least-squares value iteration (RLSVI), a model-free algorithm where exploration is induced by perturbing the least-squares approximation of the action-value function… CONTINUE READING
    Learning Near Optimal Policies with Low Inherent Bellman Error
    • 13
    • PDF
    On Reward-Free Reinforcement Learning with Linear Function Approximation
    • 5
    • PDF
    Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium
    • 3
    • PDF
    Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration
    • 1
    • PDF
    A Unifying View of Optimism in Episodic Reinforcement Learning
    • 4
    • PDF
    Oracle-Efficient Reinforcement Learning in Factored MDPs with Unknown Structure
    Efficient Learning in Non-Stationary Linear Markov Decision Processes


    Publications referenced by this paper.
    Near-optimal Regret Bounds for Reinforcement Learning
    • 650
    • PDF
    Minimax Regret Bounds for Reinforcement Learning
    • 197
    • PDF
    Finite-Time Bounds for Fitted Value Iteration
    • 253
    • PDF