• Publications
  • Influence
Gradient Surgery for Multi-Task Learning
TLDR
This work identifies a set of three conditions of the multi-task optimization landscape that cause detrimental gradient interference, and develops a simple yet general approach for avoiding such interference between task gradients.
DeepMDP: Learning Continuous Latent Space Models for Representation Learning
TLDR
This work introduces the concept of a DeepMDP, a parameterized latent space model that is trained via the minimization of two tractable losses: prediction of rewards and prediction of the distribution over next latent states, and shows that the optimization of these objectives guarantees the quality of the latent space as a representation of the state space.
Dopamine: A Research Framework for Deep Reinforcement Learning
TLDR
Dopamine is an open-source, TensorFlow-based, and compact and reliable implementations of some state-of-the-art deep RL agents that complement this offering with a taxonomy of the different research objectives in deep RL research.
One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL
TLDR
The key insight of this work is that learning diverse behaviors for accomplishing a task can directly lead to behavior that generalizes to varying environments, without needing to perform explicit perturbations during training.
Statistics and Samples in Distributional Reinforcement Learning
TLDR
This work presents a unifying framework for designing and analysing distributional reinforcement learning (DRL) algorithms in terms of recursively estimating statistics of the return distribution and develops a deep RL variant of the algorithm, ER-DQN, which is evaluated on the Atari-57 suite of games.
Federated Control with Hierarchical Multi-Agent Deep Reinforcement Learning
TLDR
This work presents a framework combining hierarchical and multi-agent deep reinforcement learning approaches to solve coordination problems among a multitude of agents using a semi-decentralized model and shows promising initial experimental results on a simulated distributed scheduling problem.
Learning to Compose Skills
TLDR
A differentiable framework capable of learning a wide variety of compositions of simple policies that are called skills, which is able to quickly build complex skills from simpler ones, allowing for zero-shot generalizations.
Multi-Task Reinforcement Learning without Interference
TLDR
This work develops a general approach that can change the multi-task optimization landscape to alleviate conflicting gradients across tasks and introduces two instantiations of this approach that prevent gradients for different tasks from interfering with one another.
Characterizing the Gap Between Actor-Critic and Policy Gradient
TLDR
This paper identifies the exact adjustment to the AC objective/gradient that recovers the true policy gradient of the cumulative reward objective (PG) and shows that the Stackelberg policy gradient can be recovered as a special case of the more general analysis.
State Space Decomposition and Subgoal Creation for Transfer in Deep Reinforcement Learning
TLDR
A framework through which a deep RL agent learns to generalize policies from smaller, simpler domains to more complex ones using a recurrent attention mechanism, and shows that the meta-controller learns to create subgoals within the attention.
...
1
2
...