• Publications
  • Influence
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
TLDR
This work introduces Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning that performs on par or better than the current state of the art on both transfer and semi- supervised benchmarks.
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
TLDR
A new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) is developed that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation.
A Distributional Perspective on Reinforcement Learning
TLDR
This paper argues for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent, and designs a new algorithm which applies Bellman's equation to the learning of approximate value distributions.
Unifying Count-Based Exploration and Intrinsic Motivation
TLDR
This work uses density models to measure uncertainty, and proposes a novel algorithm for deriving a pseudo-count from an arbitrary density model, which enables this technique to generalize count-based exploration algorithms to the non-tabular case.
Minimax Regret Bounds for Reinforcement Learning
We consider the problem of provably optimal exploration in reinforcement learning for finite horizon MDPs. We show that an optimistic modification to value iteration achieves a regret bound of
Best Arm Identification in Multi-Armed Bandits
TLDR
This work proposes a highly exploring UCB policy and a new algorithm based on successive rejects that are essentially optimal since their regret decreases exponentially at a rate which is, up to a logarithmic factor, the best possible.
Noisy Networks for Exploration
TLDR
It is found that replacing the conventional exploration heuristics for A3C, DQN and dueling agents with NoisyNet yields substantially higher scores for a wide range of Atari games, in some cases advancing the agent from sub to super-human performance.
Distributional Reinforcement Learning with Quantile Regression
TLDR
A distributional approach to reinforcement learning in which the distribution over returns is modeled explicitly instead of only estimating the mean is built, and a novel distributional reinforcement learning algorithm is presented consistent with the theoretical formulation.
Learning to reinforcement learn
TLDR
This work introduces a novel approach to deep meta-reinforcement learning, which is a system that is trained using one RL algorithm, but whose recurrent dynamics implement a second, quite separate RL procedure.
Finite-Time Bounds for Fitted Value Iteration
TLDR
A theoretical analysis of the performance of sampling-based fitted value iteration (FVI) to solve infinite state-space, discounted-reward Markovian decision processes (MDPs) under the assumption that a generative model of the environment is available.
...
1
2
3
4
5
...