Author pages are created from data sourced from our academic publisher partnerships and public sources.
Share This Author
QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
- Tabish Rashid, Mikayel Samvelyan, C. S. D. Witt, Gregory Farquhar, Jakob N. Foerster, S. Whiteson
- Computer Science, MathematicsICML
- 30 March 2018
QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations, and structurally enforce that the joint-action value is monotonic in the per- agent values, which allows tractable maximisation of the jointaction-value in off-policy learning.
Counterfactual Multi-Agent Policy Gradients
- Jakob N. Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, S. Whiteson
- Computer ScienceAAAI
- 24 May 2017
A new multi-agent actor-critic method called counterfactual multi- agent (COMA) policy gradients, which uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies.
Learning to Communicate with Deep Multi-Agent Reinforcement Learning
By embracing deep neural networks, this work is able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability.
The StarCraft Multi-Agent Challenge
- Mikayel Samvelyan, Tabish Rashid, +7 authors S. Whiteson
- Computer Science, MathematicsAAMAS
- 11 February 2019
The StarCraft Multi-Agent Challenge (SMAC), based on the popular real-time strategy game StarCraft II, is proposed as a benchmark problem and an open-source deep multi-agent RL learning framework including state-of-the-art algorithms is opened.
Learning with Opponent-Learning Awareness
- Jakob N. Foerster, Richard Y. Chen, Maruan Al-Shedivat, S. Whiteson, P. Abbeel, Igor Mordatch
- Computer ScienceAAMAS
- 13 September 2017
Results show that the encounter of two LOLA agents leads to the emergence of tit-for-tat and therefore cooperation in the iterated prisoners' dilemma, while independent learning does not, and LOLA also receives higher payouts compared to a naive learner, and is robust against exploitation by higher order gradient-based methods.
A theoretical and empirical analysis of Expected Sarsa
- H. V. Seijen, H. V. Hasselt, S. Whiteson, M. Wiering
- Computer ScienceIEEE Symposium on Adaptive Dynamic Programming…
- 15 May 2009
It is proved that Expected Sarsa converges under the same conditions as SARSa and formulate specific hypotheses about when ExpectedSarsa will outperform SarsA and Q-learning, and it is demonstrated that Ex expected sarsa has significant advantages over these more commonly used methods.
A Survey of Multi-Objective Sequential Decision-Making
- Diederik M. Roijers, P. Vamplew, S. Whiteson, R. Dazeley
- Mathematics, Computer ScienceJ. Artif. Intell. Res.
- 1 October 2013
This article surveys algorithms designed for sequential decision-making problems with multiple objectives and proposes a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function, and the type of policies considered.
Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning
Two methods using a multi-agent variant of importance sampling to naturally decay obsolete data and conditioning each agent's value function on a fingerprint that disambiguates the age of the data sampled from the replay memory enable the successful combination of experience replay with multi- agent RL.
LipNet: End-to-End Sentence-level Lipreading
This work presents LipNet, a model that maps a variable-length sequence of video frames to text, making use of spatiotemporal convolutions, a recurrent network, and the connectionist temporal classification loss, trained entirely end-to-end.
Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem
A sharp finite-time regret bound of order O(K log T) is proved on a very general class of dueling bandit problems that matches a lower bound proven in (Yue et al., 2012).