• Corpus ID: 246485834

Near-Optimal Learning of Extensive-Form Games with Imperfect Information

  title={Near-Optimal Learning of Extensive-Form Games with Imperfect Information},
  author={Yunru Bai and Chi Jin and Song Mei and Tiancheng Yu},
This paper resolves the open question of designing near-optimal algorithms for learning imperfectinformation extensive-form games from bandit feedback. We present the first line of algorithms that require only Õ((XA+Y B)/ε) episodes of play to find an ε-approximate Nash equilibrium in two-player zero-sum games, whereX, Y are the number of information sets and A,B are the number of actions for the two players. This improves upon the best known sample complexity of Õ((XA+ Y B)/ε) by a factor of… 

Tables from this paper

Generalized Bandit Regret Minimizer Framework in Imperfect Information Extensive-Form Game
This paper presents a theoretical framework for the design and the modular analysis of the bandit regret minimization methods, and describes a novel method SIX-OMD to learn approximate Nash equilibrium in IIEGs.
Efficient Φ-Regret Minimization in Extensive-Form Games via Online Mirror Descent
An improved algorithm with balancing techniques that achieves a sharp EFCE-regret under bandit-feedback in an EFG with X information sets, A actions, and T episodes is designed, which is the best knowledge, and matches the information-theoretic lower bound.


Almost Optimal Algorithms for Two-player Markov Games with Linear Function Approximation
It is shown that Nash-UCRL-VTR can provably achieve an Õ(dH √ T ) regret, where d is the linear function dimension, H is the length of the game and T is the total number of steps in the game, which suggests the optimality of the algorithm.
Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium
This work develops provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games with simultaneous moves and proposes an optimistic variant of the least-squares minimax value iteration algorithm.
Monte Carlo Sampling for Regret Minimization in Extensive Games
A general family of domain-independent CFR sample-based algorithms called Monte Carlo counterfactual regret minimization (MCCFR) is described, of which the original and poker-specific versions are special cases.
Towards General Function Approximation in Zero-Sum Markov Games
In the decoupled setting where the agent controls a single player and plays against an arbitrary opponent, a new model-free algorithm is proposed and it is proved that sample complexity can be bounded by a generalization of Witness rank to Markov games.
Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games
An efficient algorithm is given that achieves O(T) regret with high probability for that setting, even when the agent faces an adversarial environment, and significantly outperforms the prior algorithms for the problem.
Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity
The sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors is settled by showing how to generalize a near-optimal Q-learning based algorithms for MDP, in particular Sidford et al (2018), to two- player strategy computation algorithms.
Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall
The Implicit Exploration Online Mirror Descent (IXOMD) algorithm is provided, a model-free algorithm with a high-probability bound on the convergence rate to the NE of order 1/ √ T where T is the number of played games.
Faster First-Order Methods for Extensive-Form Game Solving
A specific distance-generating function, namely the dilated entropy function, is investigated over treeplexes, which are convex polytopes that encompass the strategy spaces of perfect-recall extensive-form games and develops significantly stronger bounds on the associated strong convexity parameter.
Solving Games with Functional Regret Estimation
A novel online learning method for minimizing regret in large extensive-form games that learns a function approximator online to estimate the regret for choosing a particular action and proves the approach sound by providing a bound relating the quality of the function approximation and regret of the algorithm.
Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games
An algorithm in which each agent independently runs optimistic V-learning (a variant of Q-learning) to efficiently explore the unknown environment, while using a stabilized online mirror descent (OMD) subroutine for policy updates, appears to be the first sample complexity result for learning in generic general-sum Markov games.