Corpus ID: 236635303

Towards General Function Approximation in Zero-Sum Markov Games

  title={Towards General Function Approximation in Zero-Sum Markov Games},
  author={Baihe Huang and Jason D. Lee and Zhaoran Wang and Zhuoran Yang},
This paper considers two-player zero-sum finite-horizon Markov games with simultaneous moves. The study focuses on the challenging settings where the value function or the model is parameterized by general function classes. Provably efficient algorithms for both decoupled and coordinated settings are developed. In the decoupled setting where the agent controls a single player and plays against an arbitrary opponent, we propose a new model-free algorithm. The sample complexity is governed by the… Expand


Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games
This paper provides a novel and unified error propagation analysis in Lp-norm of three well-known algorithms adapted to Stochastic Games and shows that it can achieve a stationary policy which is 2γe+e′/(1-γ)2 -optimal. Expand
Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity
The sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors is settled by showing how to generalize a near-optimal Q-learning based algorithms for MDP, in particular Sidford et al (2018), to two- player strategy computation algorithms. Expand
Learning Nash Equilibrium for General-Sum Markov Games from Batch Data
A new definition of $\epsilon$-Nash equilibrium in MGs which grasps the strategy's quality for multiplayer games and introduces a neural network architecture named NashNetwork that successfully learns a Nash equilibrium in a generic multiplayer general-sum turn-based MG. Expand
Feature-Based Q-Learning for Two-Player Stochastic Games
This work proposes a two-player Q-learning algorithm for approximating the Nash equilibrium strategy via sampling and proves that the algorithm is guaranteed to find an $\epsilon$-optimal strategy using no more than $\tilde{\mathcal{O}}(K/(\epsil on^{2}(1-\gamma)^{4}))$ samples with high probability. Expand
Online Reinforcement Learning in Stochastic Games
The UCSG algorithm is proposed that achieves a sublinear regret compared to the game value when competing with an arbitrary opponent, and this result improves previous ones under the same setting. Expand
Value Function Approximation in Zero-Sum Markov Games
The viability of value function approximation for Markov games is demonstrated by using the Least squares policy iteration (LSPI) algorithm to learn good policies for a soccer domain and a flow control problem. Expand
Provably Efficient Policy Gradient Methods for Two-Player Zero-Sum Markov Games
This is the first quantitative analysis of policy gradient methods with function approximation for two-player zero-sum Markov games and thoroughly characterize the algorithms’ performance in terms of the number of samples, number of iterations, concentrability coefficients, and approximation error. Expand
Near-Optimal Reinforcement Learning with Self-Play
An optimistic variant of the Nash Q-learning algorithm with sample complexity $\tilde{\mathcal{O}}(SAB)$, and a new \emph{Nash V-learning}, which matches the information-theoretic lower bound in all problem-dependent parameters except for a polynomial factor of the length of each episode. Expand
Actor-Critic Fictitious Play in Simultaneous Move Multistage Games
A stochastic approximation of the fictitious play process is built using an architecture inspired by actor-critic algorithms and it is proved convergence of the method towards a Nash equilibrium in both the cases of zero-sum two-player multistage games and cooperative multistages games. Expand
Actor-Critic Policy Optimization in Partially Observable Multiagent Environments
This paper examines the role of policy gradient and actor-critic algorithms in partially-observable multiagent environments and relates them to a foundation of regret minimization and multiagent learning techniques for the one-shot and tabular cases, leading to previously unknown convergence guarantees. Expand