QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
- Tabish Rashid, Mikayel Samvelyan, C. S. D. Witt, Gregory Farquhar, Jakob N. Foerster, Shimon Whiteson
- Computer ScienceInternational Conference on Machine Learning
- 30 March 2018
QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations, and structurally enforce that the joint-action value is monotonic in the per- agent values, which allows tractable maximisation of the jointaction-value in off-policy learning.
Counterfactual Multi-Agent Policy Gradients
- Jakob N. Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson
- Computer ScienceAAAI Conference on Artificial Intelligence
- 24 May 2017
A new multi-agent actor-critic method called counterfactual multi- agent (COMA) policy gradients that uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies.
Learning to Communicate with Deep Multi-Agent Reinforcement Learning
- Jakob N. Foerster, Yannis Assael, N. D. Freitas, Shimon Whiteson
- Computer ScienceNIPS
- 1 May 2016
By embracing deep neural networks, this work is able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability.
The StarCraft Multi-Agent Challenge
- Mikayel Samvelyan, Tabish Rashid, Shimon Whiteson
- Computer ScienceAdaptive Agents and Multi-Agent Systems
- 11 February 2019
The StarCraft Multi-Agent Challenge (SMAC), based on the popular real-time strategy game StarCraft II, is proposed as a benchmark problem and an open-source deep multi-agent RL learning framework including state-of-the-art algorithms is opened.
Learning with Opponent-Learning Awareness
- Jakob N. Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, P. Abbeel, Igor Mordatch
- Computer ScienceAdaptive Agents and Multi-Agent Systems
- 13 September 2017
Results show that the encounter of two LOLA agents leads to the emergence of tit-for-tat and therefore cooperation in the iterated prisoners' dilemma, while independent learning does not, and LOLA also receives higher payouts compared to a naive learner, and is robust against exploitation by higher order gradient-based methods.
A Survey of Multi-Objective Sequential Decision-Making
- Diederik M. Roijers, P. Vamplew, Shimon Whiteson, R. Dazeley
- Computer ScienceJournal of Artificial Intelligence Research
- 1 October 2013
This article surveys algorithms designed for sequential decision-making problems with multiple objectives and proposes a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function, and the type of policies considered.
LipNet: End-to-End Sentence-level Lipreading
- Yannis Assael, Brendan Shillingford, Shimon Whiteson, N. D. Freitas
- Computer Science
- 4 November 2016
This work presents LipNet, a model that maps a variable-length sequence of video frames to text, making use of spatiotemporal convolutions, a recurrent network, and the connectionist temporal classification loss, trained entirely end-to-end.
Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning
- Jakob N. Foerster, Nantas Nardelli, Shimon Whiteson
- Computer ScienceInternational Conference on Machine Learning
- 28 February 2017
Two methods using a multi-agent variant of importance sampling to naturally decay obsolete data and conditioning each agent's value function on a fingerprint that disambiguates the age of the data sampled from the replay memory enable the successful combination of experience replay with multi- agent RL.
VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning
- L. Zintgraf, K. Shiarlis, Shimon Whiteson
- Computer ScienceInternational Conference on Learning…
- 18 October 2019
This paper introduces variational Bayes-Adaptive Deep RL (variBAD), a way to meta-learn to perform approximate inference in an unknown environment, and incorporate task uncertainty directly during action selection and achieves higher online return than existing methods.
A theoretical and empirical analysis of Expected Sarsa
- H. V. Seijen, H. V. Hasselt, Shimon Whiteson, M. Wiering
- Computer ScienceIEEE Symposium on Adaptive Dynamic Programming…
- 15 May 2009
It is proved that Expected Sarsa converges under the same conditions as SARSa and formulate specific hypotheses about when ExpectedSarsa will outperform SarsA and Q-learning, and it is demonstrated that Ex expected sarsa has significant advantages over these more commonly used methods.
...
...