Corpus ID: 209323916

Provably Efficient Exploration in Policy Optimization

  title={Provably Efficient Exploration in Policy Optimization},
  author={Q. Cai and Zhuoran Yang and C. Jin and Zhaoran Wang},
  • Q. Cai, Zhuoran Yang, +1 author Zhaoran Wang
  • Published 2019
  • Computer Science, Mathematics
  • ArXiv
  • While policy-based reinforcement learning (RL) achieves tremendous successes in practice, it is significantly less understood in theory, especially compared with value-based RL. In particular, it remains elusive how to design a provably efficient policy optimization algorithm that incorporates exploration. To bridge such a gap, this paper proposes an Optimistic variant of the Proximal Policy Optimization algorithm (OPPO), which follows an ``optimistic version'' of the policy gradient direction… CONTINUE READING
    37 Citations
    Optimistic Policy Optimization with Bandit Feedback
    • 8
    • Highly Influenced
    • PDF
    PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning
    • 6
    • PDF
    Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization
    • 7
    • PDF
    Reward-Free Exploration for Reinforcement Learning
    • 24
    • PDF
    Exploration-Exploitation in Constrained MDPs
    • 11
    • PDF
    Provably Efficient Reinforcement Learning with General Value Function Approximation
    • 10
    Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon
    • 1
    • PDF


    Exploration-Enhanced POLITEX
    • 12
    • PDF
    Is Q-learning Provably Efficient?
    • 180
    • PDF
    Dynamic policy programming
    • 74
    • PDF
    On the sample complexity of reinforcement learning.
    • 433
    • PDF
    Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model
    • 110
    • PDF
    Proximal Policy Optimization Algorithms
    • 2,956
    • Highly Influential
    • PDF