• Corpus ID: 232233084

Modelling Behavioural Diversity for Learning in Open-Ended Games

  title={Modelling Behavioural Diversity for Learning in Open-Ended Games},
  author={Nicolas Perez Nieves and Yaodong Yang and Oliver Slumbers and David Henry Mguni and Jun Wang},
Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on determinantal point processes (DPP). By incorporating the… 

Figures and Tables from this paper

Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games
This work summarizes previous concepts of diversity and work towards offering a unified measure of diversity in multi-agent open-ended learning to include all elements in Markov games, based on both Behavioral Diversity (BD) and Response Diversity (RD) .
Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games
This work introduces a framework, LMAC, based on meta-gradient descent that automates the discovery of the update rule without explicit human design and is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO on Leduc Poker.
Neural Auto-Curricula
This paper introduces a novel framework—Neural Auto-Curricula (NAC)—that leverages meta-gradient descent to automate the discovery of the learning update rule without explicit human design, and shows that NAC is able to generalise from small games to large games.
Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games
Self-Play PSRO (SP-PSRO) is introduced, a method that adds an approximately optimal stochastic policy to the population in each iteration and empirically tends to converge much faster than APSRO and in many games converges in just a few iterations.
Learning Risk-Averse Equilibria in Multi-Agent Systems
In multi-agent systems, intelligent agents are tasked with making decisions that have optimal outcomes when the actions of the other agents are as expected, whilst also being prepared for unexpected
On the Convergence of Fictitious Play: A Decomposition Approach
A linear relationship unifying cooperation and competition in the sense that these two classes of games are mutually transferable is developed, and sufficient conditions for FP to converge are developed.
A Unified Perspective on Deep Equilibrium Finding
A unified perspective on deep equilibrium finding that generalizes both PSRO and CFR is proposed and demonstrates that the approach can outperform both frameworks.
Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization
Experiments show that RSPO is able to discover a wide spectrum of strategies in a variety of domains, ranging from single-agent particle-world tasks and MuJoCo continuous control to multi-agent stag-hunt games and StarCraftII challenges.
Efficient Policy Space Response Oracles
Theoretically, the solution procedures of EPSRO offer a monotonic improvement on the exploitability, which none of existing PSRO methods possess, and it is proved that the no-regret optimization has a regret bound of O (cid:112) T log [( k 2 + k ) / 2]) , where k is the size of restricted policy set.
Measuring the Non-Transitivity in Chess
It is concluded that maintaining large and diverse populations of strategies is imperative to training effective AI agents for solving chess and the implications of non-transitivity for population-based training methods are investigated.


A Generalized Training Approach for Multiagent Learning
This paper extends the theoretical underpinnings of PSRO by considering an alternative solution concept, $\alpha$-Rank, which is unique (thus faces no equilibrium selection issues, unlike Nash) and applies readily to general-sum, many-player settings, and establishes convergence guarantees in several games classes.
α-Rank: Multi-Agent Evaluation by Evolution
We introduce α-Rank, a principled evolutionary dynamics methodology, for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic
Open-ended Learning in Symmetric Zero-sum Games
A geometric framework for formulating agent objectives in zero-sum games is introduced, and a new algorithm (rectified Nash response, PSRO_rN) is developed that uses game-theoretic niching to construct diverse populations of effective agents, producing a stronger set of agents than existing algorithms.
Generalised weakened fictitious play
The complexity of computing a Nash equilibrium
This proof uses ideas from the recently-established equivalence between polynomial time solvability of normal form games and graphical games, establishing that these kinds of games can simulate a PPAD-complete class of Brouwer functions.
Alphastar: Mastering the real-time strategy game starcraft ii
  • DeepMind blog,
  • 2019
Diversity is All You Need: Learning Skills without a Reward Function
The proposed DIAYN ("Diversity is All You Need"), a method for learning useful skills without a reward function, learns skills by maximizing an information theoretic objective using a maximum entropy policy.
A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning
An algorithm is described, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game-theoretic analysis to compute meta-strategies for policy selection, which generalizes previous ones such as InRL.
Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems
It is argued that behavioural diversity is a pivotal, yet under-explored, component for real-world multiagent learning systems, and that significant work remains in understanding how to design a diversity-aware auto-curriculum.