• Publications
  • Influence
Deterministic Policy Gradient Algorithms
TLDR
We introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. Expand
  • 1,420
  • 239
  • PDF
Value-Decomposition Networks For Cooperative Multi-Agent Learning
TLDR
We study the problem of cooperative multi-agent reinforcement learning with a single joint reward signal. Expand
  • 189
  • 39
  • PDF
Conditional mean embeddings as regressors
TLDR
We demonstrate an equivalence between reproducing kernel Hilbert space (RKHS) embeddings of conditional distributions and vector-valued regressors. Expand
  • 86
  • 18
  • PDF
Modelling transition dynamics in MDPs with RKHS embeddings
TLDR
We propose a nonparametric approach to learning and representing transition dynamics in Markov decision processes (MDPs), which can be combined easily with dynamic programming methods for policy optimisation and value estimation. Expand
  • 86
  • 16
  • PDF
Human-level performance in 3D multiplayer games with population-based reinforcement learning
TLDR
Artificial teamwork Artificially intelligent agents are getting better and better at two-player games, but most real-world endeavors require teamwork. Expand
  • 206
  • 12
  • PDF
Tighter PAC-Bayes bounds through distribution-dependent priors
TLDR
We use a localised PACBayesian analysis to prove sharp risk bounds for stochastic exponential weights algorithms and develop insights into controlling function class complexity in this method. Expand
  • 64
  • 10
  • PDF
Modelling Policies in MDPs in Reproducing Kernel Hilbert Space
TLDR
We present a framework for performing gradientbased policy optimization in the RKHS, deriving the functional gradient of the return for our policy, which has a simple form and can be estimated efficiently. Expand
  • 23
  • 9
  • PDF
Nesterov's accelerated gradient and momentum as approximations to regularised update descent
TLDR
We show that a new algorithm, which we term Regularised Gradient Descent, can converge more quickly than either Nesterov's algorithm or the classical momentum algorithm, lending a new intuitive interpretation to the latter algorithm. Expand
  • 58
  • 7
  • PDF
Predicting the Labelling of a Graph via Minimum $p$-Seminorm Interpolation
TLDR
We study the problem of predicting the labelling of a graph. Expand
  • 48
  • 7
  • PDF
Human-level performance in first-person multiplayer games with population-based deep reinforcement learning
TLDR
We demonstrate for the first time that an agent can achieve human-level in a popular 3D multiplayer first-person video game, Quake III Arena Capture the Flag, using only pixels and game points as input. Expand
  • 97
  • 5