# Almost Optimal Algorithms for Two-player Zero-Sum Linear Mixture Markov Games

We study reinforcement learning for two-player zero-sum Markov games with simultaneous moves in the ﬁnite-horizon setting, where the transition kernel of the underlying Markov games can be parameterized by a linear function over the current state, both players’ actions and the next state. In particular, we assume that we can control both players and aim to ﬁnd the Nash Equilibrium by min-imizing the duality gap. We propose an algorithm Nash-UCRL based on the principle “Optimism-in-Face-of…
