• Corpus ID: 221995706

Learning to Play against Any Mixture of Opponents

@article{Smith2020LearningTP,
  title={Learning to Play against Any Mixture of Opponents},
  author={Max O. Smith and Thomas W. Anthony and Yongzhao Wang and Michael P. Wellman},
  journal={ArXiv},
  year={2020},
  volume={abs/2009.14180}
}
Intuitively, experience playing against one mixture of opponents in a given domain should be relevant for a different mixture in the same domain. We propose a transfer learning method, Q-Mixing, that starts by learning Q-values against each pure-strategy opponent. Then a Q-value for any distribution of opponent strategies is approximated by appropriately averaging the separately learned Q-values. From these components, we construct policies against all opponent mixtures without any further… 
Iterative Empirical Game Solving via Single Policy Best Response
TLDR
Two variations of PSRO are introduced designed to reduce the amount of simulation required during training required by PSRO, while producing equivalent or better solutions to the game.
Generalized Beliefs for Cooperative AI
TLDR
This work proposes a belief learning model that can maintain beliefs over rollouts of policies not seen at training time, and can thus decode and adapt to novel conventions at test time and shows how this model can improve ad-hoc teamplay.
Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games
TLDR
It is shown that the resulting conditional policies incorporate prior information about their opponents effectively, enabling near optimal returns against arbitrary mixture policies in a game with tractable best-responses.
NeuPL: Neural Population Learning
TLDR
This work proposes Neural Population Learning (NeuPL) and shows that novel strategies become more accessible, not less, as the neural population expands, and offers convergence guarantees to a population of best-responses under mild assumptions.
A Game-Theoretic Approach for Improving Generalization Ability of TSP Solvers
TLDR
A two-player zero-sum framework between a trainable Solver and a Data Generator to improve the generalization ability of deep learning-based solvers for Traveling Salesman Problem (TSP).
Deep Interactive Bayesian Reinforcement Learning via Meta-Learning
TLDR
This work proposes to meta-learn approximate belief inference and Bayes-optimal behaviour for a given prior, and shows empirically that this approach outperforms existing methods that use a model-free approach, sample from the approximate posterior, maintain memory-free models of others, or do not fully utilise the known structure of the environment.

References

SHOWING 1-10 OF 65 REFERENCES
A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning
TLDR
An algorithm is described, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game-theoretic analysis to compute meta-strategies for policy selection, which generalizes previous ones such as InRL.
Deep Reinforcement Learning with Double Q-Learning
TLDR
This paper proposes a specific adaptation to the DQN algorithm and shows that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.
Reinforcement Learning: An Introduction
TLDR
This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
The Hanabi Challenge: A New Frontier for AI Research
Human-level performance in 3D multiplayer games with population-based reinforcement learning
TLDR
A tournament-style evaluation is used to demonstrate that an agent can achieve human-level performance in a three-dimensional multiplayer first-person video game, Quake III Arena in Capture the Flag mode, using only pixels and game points scored as input.
QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
TLDR
QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations, and structurally enforce that the joint-action value is monotonic in the per- agent values, which allows tractable maximisation of the jointaction-value in off-policy learning.
Cooperating with Unknown Teammates in Complex Domains: A Robot Soccer Case Study of Ad Hoc Teamwork
TLDR
A new algorithm, PLASTIC–Policy, is introduced that builds on an existing ad hoc teamwork approach and learns policies to cooperate with past teammates and reuses these policies to quickly adapt to new teammates.
State Abstraction Discovery from Irrelevant State Variables
TLDR
This work proposes an algorithm for the automatic discovery of state abstraction from policies learned in one domain for use in other domains that have similar structure and introduces a novel condition for state abstraction in terms of the relevance of state features to optimal behavior.
Extending Q-Learning to General Adaptive Multi-Agent Systems
TLDR
This paper proposes a fundamentally different approach to Q-Learning, dubbed Hyper-Q, in which values of mixed strategies rather than base actions are learned, and in which other agents' strategies are estimated from observed actions via Bayesian inference.
Correlated Q-Learning
TLDR
Correlated-Q (CE-Q) learning is introduced, a multiagent Q-learning algorithm based on the correlated equilibrium (CE) solution concept that generalizes both Nash-Q and Friend-and-Foe-Q.
...
...