# Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium

@article{Li2022LearningTM, title={Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium}, author={Chris Junchi Li and Dongruo Zhou and Quanquan Gu and Michael I. Jordan}, journal={ArXiv}, year={2022}, volume={abs/2208.05363} }

We consider learning Nash equilibria in two-player zero-sum Markov Games with nonlinear function approximation, where the action-value function is approximated by a function in a Reproducing Kernel Hilbert Space (RKHS). The key challenge is how to do exploration in the high-dimensional function space. We propose a novel online learning algorithm to ﬁnd a Nash equilibrium by minimizing the duality gap. At the core of our algorithms are upper and lower conﬁdence bounds that are derived based on…

## References

SHOWING 1-10 OF 54 REFERENCES

### Almost Optimal Algorithms for Two-player Markov Games with Linear Function Approximation

- Computer ScienceArXiv
- 2021

It is shown that Nash-UCRL-VTR can provably achieve an Õ(dH √ T ) regret, where d is the linear function dimension, H is the length of the game and T is the total number of steps in the game, which suggests the optimality of the algorithm.

### Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

- Computer ScienceCOLT
- 2020

This work develops provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games with simultaneous moves and proposes an optimistic variant of the least-squares minimax value iteration algorithm.

### Towards General Function Approximation in Zero-Sum Markov Games

- Computer ScienceICLR
- 2022

In the decoupled setting where the agent controls a single player and plays against an arbitrary opponent, a new model-free algorithm is proposed and it is proved that sample complexity can be bounded by a generalization of Witness rank to Markov games.

### Feature-Based Q-Learning for Two-Player Stochastic Games

- Computer ScienceArXiv
- 2019

This work proposes a two-player Q-learning algorithm for approximating the Nash equilibrium strategy via sampling and proves that the algorithm is guaranteed to find an $\epsilon$-optimal strategy using no more than $\tilde{\mathcal{O}}(K/(\epsil on^{2}(1-\gamma)^{4}))$ samples with high probability.

### Learning Nash Equilibrium for General-Sum Markov Games from Batch Data

- Economics, Computer ScienceAISTATS
- 2017

A new definition of $\epsilon$-Nash equilibrium in MGs which grasps the strategy's quality for multiplayer games and introduces a neural network architecture named NashNetwork that successfully learns a Nash equilibrium in a generic multiplayer general-sum turn-based MG.

### Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity

- Computer ScienceAISTATS
- 2020

The sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors is settled by showing how to generalize a near-optimal Q-learning based algorithms for MDP, in particular Sidford et al (2018), to two- player strategy computation algorithms.

### Value Function Approximation in Zero-Sum Markov Games

- Computer Science, MathematicsUAI
- 2002

The viability of value function approximation for Markov games is demonstrated by using the Least squares policy iteration (LSPI) algorithm to learn good policies for a soccer domain and a flow control problem.

### Online Reinforcement Learning in Stochastic Games

- Computer ScienceNIPS
- 2017

The UCSG algorithm is proposed that achieves a sublinear regret compared to the game value when competing with an arbitrary opponent, and this result improves previous ones under the same setting.

### Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

- Computer ScienceICML
- 2010

This work analyzes GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.

### On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game

- Computer ScienceICML
- 2021

This work establishes the first provably efficient reward-free RL algorithm with kernel and neural function approximators, and designs exploration and planning algorithms for both single-agent MDPs and zero-sum Markov games.