• Publications
  • Influence
Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games
This work introduces a framework, LMAC, based on meta-gradient descent that automates the discovery of the update rule without explicit human design and is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO on Leduc Poker.
Modelling Behavioural Diversity for Learning in Open-Ended Games
By incorporating the diversity metric into best-response dynamics, this work develops diverse fictitious play and diverse policy-space response oracle for solving normalform games and open-ended games and proves the uniqueness of the diverse best response and the convergence of the algorithms on two-player games.
Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems
It is argued that behavioural diversity is a pivotal, yet under-explored, component for real-world multiagent learning systems, and that significant work remains in understanding how to design a diversity-aware auto-curriculum.
Online Double Oracle
This paper proposes new learning algorithms for solving two-player zero-sum normal-form games where the number of pure strategies is prohibitively large and ODO is rationale in the sense that each agent in ODO can exploit strategic adversary with a regret bound of O.
Neural Auto-Curricula
This paper introduces a novel framework—Neural Auto-Curricula (NAC)—that leverages meta-gradient descent to automate the discovery of the learning update rule without explicit human design, and shows that NAC is able to generalise from small games to large games.
Learning Risk-Averse Equilibria in Multi-Agent Systems
In multi-agent systems, intelligent agents are tasked with making decisions that have optimal outcomes when the actions of the other agents are as expected, whilst also being prepared for unexpected
A Game-Theoretic Approach for Improving Generalization Ability of TSP Solvers
A two-player zero-sum framework between a trainable Solver and a Data Generator to improve the generalization ability of deep learning-based solvers for Traveling Salesman Problem (TSP).
Timing is Everything: Learning to Act Selectively with Costly Actions and Budgetary Constraints
This paper proves that LICRA, which seamlessly adopts any RL method, converges to policies that optimally select when to perform actions and their optimal magnitudes and shows LICRA learns the optimal value function and ensures budget constraints are satisfied almost surely.
LIGS: Learnable Intrinsic-Reward Generation Selection for Multi-Agent Learning
A new general framework for improving coordination and performance of multi-agent reinforcement learners (MARL), named Learnable Intrinsic-Reward Generation Selection algorithm (LIGS), which introduces an adaptive learner, Generator that observes the agents and learns to construct intrinsic rewards online that coordinate the agents’ joint exploration and joint behaviour.