• Publications
  • Influence
QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
TLDR
QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations, and structurally enforce that the joint-action value is monotonic in the per- agent values, which allows tractable maximisation of the jointaction-value in off-policy learning. Expand
Counterfactual Multi-Agent Policy Gradients
TLDR
A new multi-agent actor-critic method called counterfactual multi- agent (COMA) policy gradients, which uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. Expand
Learning to Communicate with Deep Multi-Agent Reinforcement Learning
TLDR
By embracing deep neural networks, this work is able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability. Expand
Learning with Opponent-Learning Awareness
TLDR
Results show that the encounter of two LOLA agents leads to the emergence of tit-for-tat and therefore cooperation in the iterated prisoners' dilemma, while independent learning does not, and LOLA also receives higher payouts compared to a naive learner, and is robust against exploitation by higher order gradient-based methods. Expand
The StarCraft Multi-Agent Challenge
TLDR
The StarCraft Multi-Agent Challenge (SMAC), based on the popular real-time strategy game StarCraft II, is proposed as a benchmark problem and an open-source deep multi-agent RL learning framework including state-of-the-art algorithms is opened. Expand
The Mechanics of n-Player Differentiable Games
TLDR
The key result is to decompose the second-order dynamics into two components, related to potential games, which reduce to gradient descent on an implicit function; the second relates to Hamiltonian games, a new class of games that obey a conservation law, akin to conservation laws in classical mechanical systems. Expand
Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning
TLDR
Two methods using a multi-agent variant of importance sampling to naturally decay obsolete data and conditioning each agent's value function on a fingerprint that disambiguates the age of the data sampled from the replay memory enable the successful combination of experience replay with multi- agent RL. Expand
The Hanabi Challenge: A New Frontier for AI Research
TLDR
It is argued that Hanabi elevates reasoning about the beliefs and intentions of other agents to the foreground and developing novel techniques for such theory of mind reasoning will not only be crucial for success in Hanabi, but also in broader collaborative efforts, especially those with human partners. Expand
DiCE: The Infinitely Differentiable Monte-Carlo Estimator
TLDR
DiCE is introduced, which provides a single objective that can be differentiated repeatedly, generating correct gradient estimators of any order in SCGs, and is used to propose and evaluate a novel approach for multi-agent learning. Expand
On the Pitfalls of Measuring Emergent Communication
TLDR
By training deep reinforcement learning agents to play simple matrix games augmented with a communication channel, this paper finds a scenario where agents appear to communicate, and yet the messages do not impact the environment or other agent in any way. Expand
...
1
2
3
4
5
...