# Learning to Play against Any Mixture of Opponents

@article{Smith2020LearningTP, title={Learning to Play against Any Mixture of Opponents}, author={Max O. Smith and Thomas W. Anthony and Yongzhao Wang and Michael P. Wellman}, journal={ArXiv}, year={2020}, volume={abs/2009.14180} }

Intuitively, experience playing against one mixture of opponents in a given domain should be relevant for a different mixture in the same domain. We propose a transfer learning method, Q-Mixing, that starts by learning Q-values against each pure-strategy opponent. Then a Q-value for any distribution of opponent strategies is approximated by appropriately averaging the separately learned Q-values. From these components, we construct policies against all opponent mixtures without any further…

## Figures and Tables from this paper

## 6 Citations

Iterative Empirical Game Solving via Single Policy Best Response

- Computer ScienceICLR
- 2021

Two variations of PSRO are introduced designed to reduce the amount of simulation required during training required by PSRO, while producing equivalent or better solutions to the game.

Generalized Beliefs for Cooperative AI

- Computer ScienceICML
- 2022

This work proposes a belief learning model that can maintain beliefs over rollouts of policies not seen at training time, and can thus decode and adapt to novel conventions at test time and shows how this model can improve ad-hoc teamplay.

Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games

- Computer ScienceICML
- 2022

It is shown that the resulting conditional policies incorporate prior information about their opponents effectively, enabling near optimal returns against arbitrary mixture policies in a game with tractable best-responses.

NeuPL: Neural Population Learning

- Computer ScienceArXiv
- 2022

This work proposes Neural Population Learning (NeuPL) and shows that novel strategies become more accessible, not less, as the neural population expands, and offers convergence guarantees to a population of best-responses under mild assumptions.

A Game-Theoretic Approach for Improving Generalization Ability of TSP Solvers

- Computer ScienceArXiv
- 2021

A two-player zero-sum framework between a trainable Solver and a Data Generator to improve the generalization ability of deep learning-based solvers for Traveling Salesman Problem (TSP).

Deep Interactive Bayesian Reinforcement Learning via Meta-Learning

- Computer ScienceAAMAS
- 2021

This work proposes to meta-learn approximate belief inference and Bayes-optimal behaviour for a given prior, and shows empirically that this approach outperforms existing methods that use a model-free approach, sample from the approximate posterior, maintain memory-free models of others, or do not fully utilise the known structure of the environment.

## References

SHOWING 1-10 OF 65 REFERENCES

A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning

- Computer ScienceNIPS
- 2017

An algorithm is described, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game-theoretic analysis to compute meta-strategies for policy selection, which generalizes previous ones such as InRL.

Deep Reinforcement Learning with Double Q-Learning

- Computer ScienceAAAI
- 2016

This paper proposes a specific adaptation to the DQN algorithm and shows that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.

Reinforcement Learning: An Introduction

- Computer ScienceIEEE Transactions on Neural Networks
- 2005

This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

The Hanabi Challenge: A New Frontier for AI Research

- Computer ScienceArtif. Intell.
- 2020

Human-level performance in 3D multiplayer games with population-based reinforcement learning

- Computer ScienceScience
- 2019

A tournament-style evaluation is used to demonstrate that an agent can achieve human-level performance in a three-dimensional multiplayer first-person video game, Quake III Arena in Capture the Flag mode, using only pixels and game points scored as input.

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

- Computer ScienceICML
- 2018

QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations, and structurally enforce that the joint-action value is monotonic in the per- agent values, which allows tractable maximisation of the jointaction-value in off-policy learning.

Cooperating with Unknown Teammates in Complex Domains: A Robot Soccer Case Study of Ad Hoc Teamwork

- Computer ScienceAAAI
- 2015

A new algorithm, PLASTIC–Policy, is introduced that builds on an existing ad hoc teamwork approach and learns policies to cooperate with past teammates and reuses these policies to quickly adapt to new teammates.

State Abstraction Discovery from Irrelevant State Variables

- Computer ScienceIJCAI
- 2005

This work proposes an algorithm for the automatic discovery of state abstraction from policies learned in one domain for use in other domains that have similar structure and introduces a novel condition for state abstraction in terms of the relevance of state features to optimal behavior.

Extending Q-Learning to General Adaptive Multi-Agent Systems

- MathematicsNIPS
- 2003

This paper proposes a fundamentally different approach to Q-Learning, dubbed Hyper-Q, in which values of mixed strategies rather than base actions are learned, and in which other agents' strategies are estimated from observed actions via Bayesian inference.

Correlated Q-Learning

- EconomicsICML
- 2003

Correlated-Q (CE-Q) learning is introduced, a multiagent Q-learning algorithm based on the correlated equilibrium (CE) solution concept that generalizes both Nash-Q and Friend-and-Foe-Q.