• Corpus ID: 232233084

# Modelling Behavioural Diversity for Learning in Open-Ended Games

@inproceedings{Nieves2021ModellingBD,
title={Modelling Behavioural Diversity for Learning in Open-Ended Games},
author={Nicolas Perez Nieves and Yaodong Yang and Oliver Slumbers and David Henry Mguni and Jun Wang},
booktitle={ICML},
year={2021}
}
• Published in ICML 14 March 2021
• Computer Science
Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on determinantal point processes (DPP). By incorporating the…
21 Citations

## Figures and Tables from this paper

Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games
• Computer Science
NeurIPS
• 2021
This work summarizes previous concepts of diversity and work towards offering a uniﬁed measure of diversity in multi-agent open-ended learning to include all elements in Markov games, based on both Behavioral Diversity (BD) and Response Diversity (RD) .
Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games
• Computer Science
ArXiv
• 2021
This work introduces a framework, LMAC, based on meta-gradient descent that automates the discovery of the update rule without explicit human design and is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO on Leduc Poker.
Neural Auto-Curricula
• Computer Science
• 2021
This paper introduces a novel framework—Neural Auto-Curricula (NAC)—that leverages meta-gradient descent to automate the discovery of the learning update rule without explicit human design, and shows that NAC is able to generalise from small games to large games.
Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games
• Computer Science
• 2022
Self-Play PSRO (SP-PSRO) is introduced, a method that adds an approximately optimal stochastic policy to the population in each iteration and empirically tends to converge much faster than APSRO and in many games converges in just a few iterations.
Learning Risk-Averse Equilibria in Multi-Agent Systems
• Economics
ArXiv
• 2022
In multi-agent systems, intelligent agents are tasked with making decisions that have optimal outcomes when the actions of the other agents are as expected, whilst also being prepared for unexpected
On the Convergence of Fictitious Play: A Decomposition Approach
• Computer Science
IJCAI
• 2022
A linear relationship unifying cooperation and competition in the sense that these two classes of games are mutually transferable is developed, and sufficient conditions for FP to converge are developed.
A Unified Perspective on Deep Equilibrium Finding
• Computer Science
ArXiv
• 2022
A uniﬁed perspective on deep equilibrium ﬁnding that generalizes both PSRO and CFR is proposed and demonstrates that the approach can outperform both frameworks.
Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization
• Computer Science
ArXiv
• 2022
Experiments show that RSPO is able to discover a wide spectrum of strategies in a variety of domains, ranging from single-agent particle-world tasks and MuJoCo continuous control to multi-agent stag-hunt games and StarCraftII challenges.
Efficient Policy Space Response Oracles
• Computer Science
ArXiv
• 2022
Theoretically, the solution procedures of EPSRO offer a monotonic improvement on the exploitability, which none of existing PSRO methods possess, and it is proved that the no-regret optimization has a regret bound of O (cid:112) T log [( k 2 + k ) / 2]) , where k is the size of restricted policy set.
Measuring the Non-Transitivity in Chess
• Computer Science
Algorithms
• 2022
It is concluded that maintaining large and diverse populations of strategies is imperative to training effective AI agents for solving chess and the implications of non-transitivity for population-based training methods are investigated.

## References

SHOWING 1-10 OF 64 REFERENCES
A Generalized Training Approach for Multiagent Learning
• Computer Science
ICLR
• 2020
This paper extends the theoretical underpinnings of PSRO by considering an alternative solution concept, $\alpha$-Rank, which is unique (thus faces no equilibrium selection issues, unlike Nash) and applies readily to general-sum, many-player settings, and establishes convergence guarantees in several games classes.
α-Rank: Multi-Agent Evaluation by Evolution
• Economics
Scientific Reports
• 2019
We introduce α-Rank, a principled evolutionary dynamics methodology, for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic
Open-ended Learning in Symmetric Zero-sum Games
• Economics
ICML
• 2019
A geometric framework for formulating agent objectives in zero-sum games is introduced, and a new algorithm (rectified Nash response, PSRO_rN) is developed that uses game-theoretic niching to construct diverse populations of effective agents, producing a stronger set of agents than existing algorithms.
Generalised weakened fictitious play
• Psychology
Games Econ. Behav.
• 2006
The complexity of computing a Nash equilibrium
• Economics
STOC '06
• 2006
This proof uses ideas from the recently-established equivalence between polynomial time solvability of normal form games and graphical games, establishing that these kinds of games can simulate a PPAD-complete class of Brouwer functions.
Alphastar: Mastering the real-time strategy game starcraft ii
• DeepMind blog,
• 2019
Diversity is All You Need: Learning Skills without a Reward Function
• Computer Science
ICLR
• 2019
The proposed DIAYN ("Diversity is All You Need"), a method for learning useful skills without a reward function, learns skills by maximizing an information theoretic objective using a maximum entropy policy.
A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning
• Computer Science
NIPS
• 2017
An algorithm is described, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game-theoretic analysis to compute meta-strategies for policy selection, which generalizes previous ones such as InRL.
Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems
• Computer Science
AAMAS
• 2021
It is argued that behavioural diversity is a pivotal, yet under-explored, component for real-world multiagent learning systems, and that significant work remains in understanding how to design a diversity-aware auto-curriculum.