• Publications
  • Influence
QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
TLDR
QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations, and structurally enforce that the joint-action value is monotonic in the per- agent values, which allows tractable maximisation of the jointaction-value in off-policy learning.
Counterfactual Multi-Agent Policy Gradients
TLDR
A new multi-agent actor-critic method called counterfactual multi- agent (COMA) policy gradients, which uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies.
Learning to Communicate with Deep Multi-Agent Reinforcement Learning
TLDR
By embracing deep neural networks, this work is able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability.
The StarCraft Multi-Agent Challenge
TLDR
The StarCraft Multi-Agent Challenge (SMAC), based on the popular real-time strategy game StarCraft II, is proposed as a benchmark problem and an open-source deep multi-agent RL learning framework including state-of-the-art algorithms is opened.
Learning with Opponent-Learning Awareness
TLDR
Results show that the encounter of two LOLA agents leads to the emergence of tit-for-tat and therefore cooperation in the iterated prisoners' dilemma, while independent learning does not, and LOLA also receives higher payouts compared to a naive learner, and is robust against exploitation by higher order gradient-based methods.
A theoretical and empirical analysis of Expected Sarsa
TLDR
It is proved that Expected Sarsa converges under the same conditions as SARSa and formulate specific hypotheses about when ExpectedSarsa will outperform SarsA and Q-learning, and it is demonstrated that Ex expected sarsa has significant advantages over these more commonly used methods.
A Survey of Multi-Objective Sequential Decision-Making
TLDR
This article surveys algorithms designed for sequential decision-making problems with multiple objectives and proposes a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function, and the type of policies considered.
Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning
TLDR
Two methods using a multi-agent variant of importance sampling to naturally decay obsolete data and conditioning each agent's value function on a fingerprint that disambiguates the age of the data sampled from the replay memory enable the successful combination of experience replay with multi- agent RL.
LipNet: End-to-End Sentence-level Lipreading
TLDR
This work presents LipNet, a model that maps a variable-length sequence of video frames to text, making use of spatiotemporal convolutions, a recurrent network, and the connectionist temporal classification loss, trained entirely end-to-end.
Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem
TLDR
A sharp finite-time regret bound of order O(K log T) is proved on a very general class of dueling bandit problems that matches a lower bound proven in (Yue et al., 2012).
...
1
2
3
4
5
...