Corpus ID: 211069002

Provable Self-Play Algorithms for Competitive Reinforcement Learning

  title={Provable Self-Play Algorithms for Competitive Reinforcement Learning},
  author={Yu Bai and C. Jin},
  • Yu Bai, C. Jin
  • Published in ICML 2020
  • Computer Science, Mathematics
  • Self-play, where the algorithm learns by playing against itself without requiring any direct supervision, has become the new weapon in modern Reinforcement Learning (RL) for achieving superhuman performance in practice. However, the majority of exisiting theory in reinforcement learning only applies to the setting where the agent plays against a fixed environment; it remains largely open whether self-play algorithms can be provably effective, especially when it is necessary to manage the… CONTINUE READING
    12 Citations
    Near-Optimal Reinforcement Learning with Self-Play
    • 5
    • PDF
    Efficient Competitive Self-Play Policy Optimization
    • PDF
    A Sharp Analysis of Model-based Reinforcement Learning with Self-Play
    • 1
    • PDF
    Provably Efficient Online Agnostic Learning in Markov Games
    • Highly Influenced
    • PDF
    Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity
    • 6
    • PDF
    Independent Policy Gradient Methods for Competitive Reinforcement Learning
    • 1
    • PDF
    Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon
    • 4
    • PDF
    Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory
    • PDF
    Fictitious play in zero-sum stochastic games
    • PDF


    Is Q-learning Provably Efficient?
    • 190
    • PDF
    Reward-Free Exploration for Reinforcement Learning
    • 29
    • PDF
    Learning to compete, compromise, and cooperate in repeated general-sum games
    • 47
    • PDF
    Nash Q-Learning for General-Sum Stochastic Games
    • 768
    • PDF
    Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms
    • 90
    • PDF
    Mastering the game of Go without human knowledge
    • 3,773
    • PDF
    Corruption Robust Exploration in Episodic Reinforcement Learning
    • 17
    • PDF
    Feature-Based Q-Learning for Two-Player Stochastic Games
    • 19
    • PDF
    R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning
    • 1,046
    • Highly Influential
    • PDF
    Finite-time Analysis of the Multiarmed Bandit Problem
    • 4,338
    • PDF