# Offline Learning in Markov Games with General Function Approximation

@article{Zhang2023OfflineLI, title={Offline Learning in Markov Games with General Function Approximation}, author={Yuheng Zhang and Yunru Bai and Nan Jiang}, journal={ArXiv}, year={2023}, volume={abs/2302.02571} }

We study offline multi-agent reinforcement learning (RL) in Markov games, where the goal is to learn an approximate equilibrium -- such as Nash equilibrium and (Coarse) Correlated Equilibrium -- from an offline dataset pre-collected from the game. Existing works consider relatively restricted tabular or linear models and handle each equilibria separately. In this work, we provide the first framework for sample-efficient offline learning in Markov games under general function approximation…

## 46 References

### Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game

- Computer ScienceArXiv
- 2022

This paper proposes a new pessimism-based algorithm for offline linear MDPs and MGs with linear function approximation and extends the techniques to the two-player zero-sum Markov games, and establishes a new performance lower bound for MGs.

### Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

- Computer ScienceCOLT
- 2020

This work develops provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games with simultaneous moves and proposes an optimistic variant of the least-squares minimax value iteration algorithm.

### Almost Optimal Algorithms for Two-player Zero-Sum Linear Mixture Markov Games

- MathematicsALT
- 2022

We study reinforcement learning for two-player zero-sum Markov games with simultaneous moves in the ﬁnite-horizon setting, where the transition kernel of the underlying Markov games can be…

### The Complexity of Markov Equilibrium in Stochastic Games

- Computer ScienceArXiv
- 2022

We show that computing approximate stationary Markov coarse correlated equilibria (CCE) in general-sum stochastic games is computationally intractable, even when there are two players, the game is…

### Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games

- Computer ScienceDynamic Games and Applications
- 2022

An algorithm in which each agent independently runs optimistic V-learning (a variant of Q-learning) to efficiently explore the unknown environment, while using a stabilized online mirror descent (OMD) subroutine for policy updates, appears to be the first sample complexity result for learning in generic general-sum Markov games.

### Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets

- Computer ScienceICML
- 2022

A pessimism-based algorithm is proposed, dubbed as pessimistic minimax value iteration (PMVI), which overcomes the distributional shift by constructing pessimistic estimates of the value functions for both players and outputs a policy pair by solving NEs based on the two value functions.

### Towards General Function Approximation in Zero-Sum Markov Games

- Computer ScienceICLR
- 2022

In the decoupled setting where the agent controls a single player and plays against an arbitrary opponent, a new model-free algorithm is proposed and it is proved that sample complexity can be bounded by a generalization of Witness rank to Markov games.

### Offline Reinforcement Learning with Realizability and Single-policy Concentrability

- Mathematics, Computer ScienceCOLT
- 2022

This paper analyzes a simple algorithm based on the primal-dual formulation of MDPs, where the dual variables are modeled using a density-ratio function against offline data and shows that the algorithm enjoys polynomial sample complexity, under only realizability and single-policy concentrability.

### Is Pessimism Provably Efficient for Offline RL?

- Computer ScienceICML
- 2021

A pessimistic variant of the value iteration algorithm (PEVI), which incorporates an uncertainty quantifier as the penalty function and establishes a data-dependent upper bound on the suboptimality of PEVI for general Markov decision processes (MDPs).

### When is Offline Two-Player Zero-Sum Markov Game Solvable?

- Computer Science
- 2022

A new assumption named unilateral concentration is proposed and a pessimism-type algorithm is designed that can achieve minimax sample complexity without any modiﬁcation for two widely studied settings: dataset with uniform concentration assumption and turn-based Markov games.