Offline Learning in Markov Games with General Function Approximation
@article{Zhang2023OfflineLI, title={Offline Learning in Markov Games with General Function Approximation}, author={Yuheng Zhang and Yunru Bai and Nan Jiang}, journal={ArXiv}, year={2023}, volume={abs/2302.02571} }
We study offline multi-agent reinforcement learning (RL) in Markov games, where the goal is to learn an approximate equilibrium -- such as Nash equilibrium and (Coarse) Correlated Equilibrium -- from an offline dataset pre-collected from the game. Existing works consider relatively restricted tabular or linear models and handle each equilibria separately. In this work, we provide the first framework for sample-efficient offline learning in Markov games under general function approximation…
Figures from this paper
46 References
Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game
- Computer ScienceArXiv
- 2022
This paper proposes a new pessimism-based algorithm for offline linear MDPs and MGs with linear function approximation and extends the techniques to the two-player zero-sum Markov games, and establishes a new performance lower bound for MGs.
Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium
- Computer ScienceCOLT
- 2020
This work develops provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games with simultaneous moves and proposes an optimistic variant of the least-squares minimax value iteration algorithm.
Almost Optimal Algorithms for Two-player Zero-Sum Linear Mixture Markov Games
- MathematicsALT
- 2022
We study reinforcement learning for two-player zero-sum Markov games with simultaneous moves in the finite-horizon setting, where the transition kernel of the underlying Markov games can be…
The Complexity of Markov Equilibrium in Stochastic Games
- Computer ScienceArXiv
- 2022
We show that computing approximate stationary Markov coarse correlated equilibria (CCE) in general-sum stochastic games is computationally intractable, even when there are two players, the game is…
Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games
- Computer ScienceDynamic Games and Applications
- 2022
An algorithm in which each agent independently runs optimistic V-learning (a variant of Q-learning) to efficiently explore the unknown environment, while using a stabilized online mirror descent (OMD) subroutine for policy updates, appears to be the first sample complexity result for learning in generic general-sum Markov games.
Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets
- Computer ScienceICML
- 2022
A pessimism-based algorithm is proposed, dubbed as pessimistic minimax value iteration (PMVI), which overcomes the distributional shift by constructing pessimistic estimates of the value functions for both players and outputs a policy pair by solving NEs based on the two value functions.
Towards General Function Approximation in Zero-Sum Markov Games
- Computer ScienceICLR
- 2022
In the decoupled setting where the agent controls a single player and plays against an arbitrary opponent, a new model-free algorithm is proposed and it is proved that sample complexity can be bounded by a generalization of Witness rank to Markov games.
Offline Reinforcement Learning with Realizability and Single-policy Concentrability
- Mathematics, Computer ScienceCOLT
- 2022
This paper analyzes a simple algorithm based on the primal-dual formulation of MDPs, where the dual variables are modeled using a density-ratio function against offline data and shows that the algorithm enjoys polynomial sample complexity, under only realizability and single-policy concentrability.
Is Pessimism Provably Efficient for Offline RL?
- Computer ScienceICML
- 2021
A pessimistic variant of the value iteration algorithm (PEVI), which incorporates an uncertainty quantifier as the penalty function and establishes a data-dependent upper bound on the suboptimality of PEVI for general Markov decision processes (MDPs).
When is Offline Two-Player Zero-Sum Markov Game Solvable?
- Computer Science
- 2022
A new assumption named unilateral concentration is proposed and a pessimism-type algorithm is designed that can achieve minimax sample complexity without any modification for two widely studied settings: dataset with uniform concentration assumption and turn-based Markov games.