• Corpus ID: 247682346

# Almost Optimal Algorithms for Two-player Zero-Sum Linear Mixture Markov Games

@inproceedings{Chen2021AlmostOA,
title={Almost Optimal Algorithms for Two-player Zero-Sum Linear Mixture Markov Games},
author={Zixiang Chen and Dongruo Zhou and Quanquan Gu},
booktitle={International Conference on Algorithmic Learning Theory},
year={2021}
}
• Published in
International Conference on…
15 February 2021
• Mathematics
We study reinforcement learning for two-player zero-sum Markov games with simultaneous moves in the ﬁnite-horizon setting, where the transition kernel of the underlying Markov games can be parameterized by a linear function over the current state, both players’ actions and the next state. In particular, we assume that we can control both players and aim to ﬁnd the Nash Equilibrium by min-imizing the duality gap. We propose an algorithm Nash-UCRL based on the principle “Optimism-in-Face-of…
6 Citations
• Computer Science
• 2022
Focusing on non-stationary Markov games, a fast learning algorithm called Q-FTRL and an adaptive sampling scheme that leverage the optimism principle in online adversarial learning (particularly the Follow-the-Regularized-Leader (F TRL) method) are developed.
• Computer Science
ICML
• 2022
This paper presents the first line of algorithms that require only episodes of play to reach an ε -approximate Nash equilibrium in two-player zero-sum games, and achieves this sample complexity by two new algorithms: Balanced Online Mirror Descent, and Balanced Counterfactual Regret Minimization.
• Computer Science
ICML
• 2022
H-MARL (Hallucinated Multi-Agent Reinforcement Learning), a novel sample-efﬁcient algorithm that can balance exploration and exploitation and improve the performance compared to non-optimistic exploration methods, is proposed.
• Computer Science
ArXiv
• 2022
This paper shows that using a single policy to guide exploration across all agents is suﬃcient and provably near-optimal for incorporating parallelism during the exploration phase and that this simple procedure is near-minimax optimal in the reward-free setting for linear MDPs.
• Computer Science
ArXiv
• 2022
Focusing on non-stationary zero-sum Markov games, a learning algorithm called Nash-Q-FTRL and an adaptive sampling scheme that leverage the optimism principle in adversarial learning, with a delicate design of bonus terms that ensure certain decomposability under the FTRL dynamics.
• Computer Science
ArXiv
• 2022
An algorithm framework for two-player zero-sum Markov Games in the full-information setting, where each iteration consists of a policy update step at each state using a certain matrix game algorithm, and a value update step with a certain learning rate.

## References

SHOWING 1-10 OF 56 REFERENCES

• Computer Science
COLT
• 2020
This work develops provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games with simultaneous moves and proposes an optimistic variant of the least-squares minimax value iteration algorithm.
• Computer Science
ICML
• 2015
This paper provides a novel and unified error propagation analysis in Lp-norm of three well-known algorithms adapted to Stochastic Games and shows that it can achieve a stationary policy which is 2γe+e′/(1-γ)2 -optimal.
• Economics, Computer Science
AISTATS
• 2017
A new definition of $\epsilon$-Nash equilibrium in MGs which grasps the strategy's quality for multiplayer games and introduces a neural network architecture named NashNetwork that successfully learns a Nash equilibrium in a generic multiplayer general-sum turn-based MG.
• Computer Science, Economics
J. Mach. Learn. Res.
• 2003
This work extends Q-learning to a noncooperative multiagent context, using the framework of general-sum stochastic games, and implements an online version of Nash Q- learning that balances exploration with exploitation, yielding improved performance.
• Computer Science
AISTATS
• 2020
The sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors is settled by showing how to generalize a near-optimal Q-learning based algorithms for MDP, in particular Sidford et al (2018), to two- player strategy computation algorithms.
• Computer Science
ArXiv
• 2019
This work proposes a two-player Q-learning algorithm for approximating the Nash equilibrium strategy via sampling and proves that the algorithm is guaranteed to find an $\epsilon$-optimal strategy using no more than $\tilde{\mathcal{O}}(K/(\epsil on^{2}(1-\gamma)^{4}))$ samples with high probability.
• Computer Science
NIPS
• 2017
The UCSG algorithm is proposed that achieves a sublinear regret compared to the game value when competing with an arbitrary opponent, and this result improves previous ones under the same setting.
• Computer Science
AISTATS
• 2016
Non-stationary Reinforcement Learning algorithms and their theoretical guarantees to the case of discounted zero-sum Markov Games (MGs) are extended and it is shown that their performance mostly depends on the nature of the propagation error.
• Computer Science
UAI
• 2021
This work proves that the plug-in solver approach, probably the most natural reinforcement learning algorithm, achieves minimax sample complexity for turn-based stochastic game (TBSG) by utilizing a `simulator' that allows sampling from arbitrary state-action pair.
• Computer Science
NeurIPS
• 2020
An optimistic variant of the Nash Q-learning algorithm with sample complexity $\tilde{\mathcal{O}}(SAB)$, and a new \emph{Nash V-learning}, which matches the information-theoretic lower bound in all problem-dependent parameters except for a polynomial factor of the length of each episode.