Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium

@article{Li2022LearningTM,
  title={Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium},
  author={Chris Junchi Li and Dongruo Zhou and Quanquan Gu and Michael I. Jordan},
  journal={ArXiv},
  year={2022},
  volume={abs/2208.05363}
}
We consider learning Nash equilibria in two-player zero-sum Markov Games with nonlinear function approximation, where the action-value function is approximated by a function in a Reproducing Kernel Hilbert Space (RKHS). The key challenge is how to do exploration in the high-dimensional function space. We propose a novel online learning algorithm to find a Nash equilibrium by minimizing the duality gap. At the core of our algorithms are upper and lower confidence bounds that are derived based on… 

References

SHOWING 1-10 OF 54 REFERENCES

Almost Optimal Algorithms for Two-player Markov Games with Linear Function Approximation

It is shown that Nash-UCRL-VTR can provably achieve an Õ(dH √ T ) regret, where d is the linear function dimension, H is the length of the game and T is the total number of steps in the game, which suggests the optimality of the algorithm.

Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

This work develops provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games with simultaneous moves and proposes an optimistic variant of the least-squares minimax value iteration algorithm.

Towards General Function Approximation in Zero-Sum Markov Games

In the decoupled setting where the agent controls a single player and plays against an arbitrary opponent, a new model-free algorithm is proposed and it is proved that sample complexity can be bounded by a generalization of Witness rank to Markov games.

Feature-Based Q-Learning for Two-Player Stochastic Games

This work proposes a two-player Q-learning algorithm for approximating the Nash equilibrium strategy via sampling and proves that the algorithm is guaranteed to find an $\epsilon$-optimal strategy using no more than $\tilde{\mathcal{O}}(K/(\epsil on^{2}(1-\gamma)^{4}))$ samples with high probability.

Learning Nash Equilibrium for General-Sum Markov Games from Batch Data

A new definition of $\epsilon$-Nash equilibrium in MGs which grasps the strategy's quality for multiplayer games and introduces a neural network architecture named NashNetwork that successfully learns a Nash equilibrium in a generic multiplayer general-sum turn-based MG.

Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity

The sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors is settled by showing how to generalize a near-optimal Q-learning based algorithms for MDP, in particular Sidford et al (2018), to two- player strategy computation algorithms.

Value Function Approximation in Zero-Sum Markov Games

The viability of value function approximation for Markov games is demonstrated by using the Least squares policy iteration (LSPI) algorithm to learn good policies for a soccer domain and a flow control problem.

Online Reinforcement Learning in Stochastic Games

The UCSG algorithm is proposed that achieves a sublinear regret compared to the game value when competing with an arbitrary opponent, and this result improves previous ones under the same setting.

Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

This work analyzes GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.

On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game

This work establishes the first provably efficient reward-free RL algorithm with kernel and neural function approximators, and designs exploration and planning algorithms for both single-agent MDPs and zero-sum Markov games.
...