# Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium

@article{Li2022LearningTM,
title={Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium},
author={Chris Junchi Li and Dongruo Zhou and Quanquan Gu and Michael I. Jordan},
journal={ArXiv},
year={2022},
volume={abs/2208.05363}
}
• Published 10 August 2022
• Computer Science
• ArXiv
We consider learning Nash equilibria in two-player zero-sum Markov Games with nonlinear function approximation, where the action-value function is approximated by a function in a Reproducing Kernel Hilbert Space (RKHS). The key challenge is how to do exploration in the high-dimensional function space. We propose a novel online learning algorithm to ﬁnd a Nash equilibrium by minimizing the duality gap. At the core of our algorithms are upper and lower conﬁdence bounds that are derived based on…

## References

SHOWING 1-10 OF 54 REFERENCES

• Computer Science
ArXiv
• 2021
It is shown that Nash-UCRL-VTR can provably achieve an Õ(dH √ T ) regret, where d is the linear function dimension, H is the length of the game and T is the total number of steps in the game, which suggests the optimality of the algorithm.
• Computer Science
COLT
• 2020
This work develops provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games with simultaneous moves and proposes an optimistic variant of the least-squares minimax value iteration algorithm.
• Computer Science
ICLR
• 2022
In the decoupled setting where the agent controls a single player and plays against an arbitrary opponent, a new model-free algorithm is proposed and it is proved that sample complexity can be bounded by a generalization of Witness rank to Markov games.
• Computer Science
ArXiv
• 2019
This work proposes a two-player Q-learning algorithm for approximating the Nash equilibrium strategy via sampling and proves that the algorithm is guaranteed to find an $\epsilon$-optimal strategy using no more than $\tilde{\mathcal{O}}(K/(\epsil on^{2}(1-\gamma)^{4}))$ samples with high probability.
• Economics, Computer Science
AISTATS
• 2017
A new definition of $\epsilon$-Nash equilibrium in MGs which grasps the strategy's quality for multiplayer games and introduces a neural network architecture named NashNetwork that successfully learns a Nash equilibrium in a generic multiplayer general-sum turn-based MG.
• Computer Science
AISTATS
• 2020
The sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors is settled by showing how to generalize a near-optimal Q-learning based algorithms for MDP, in particular Sidford et al (2018), to two- player strategy computation algorithms.
• Computer Science, Mathematics
UAI
• 2002
The viability of value function approximation for Markov games is demonstrated by using the Least squares policy iteration (LSPI) algorithm to learn good policies for a soccer domain and a flow control problem.
• Computer Science
NIPS
• 2017
The UCSG algorithm is proposed that achieves a sublinear regret compared to the game value when competing with an arbitrary opponent, and this result improves previous ones under the same setting.
• Computer Science
ICML
• 2010
This work analyzes GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.
• Computer Science
ICML
• 2021
This work establishes the first provably efficient reward-free RL algorithm with kernel and neural function approximators, and designs exploration and planning algorithms for both single-agent MDPs and zero-sum Markov games.