• Publications
  • Influence
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
TLDR
This work introduces benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL, and releases benchmark tasks and datasets with a comprehensive evaluation of existing algorithms and an evaluation protocol together with an open-source codebase.
Data-Efficient Hierarchical Reinforcement Learning
TLDR
This paper studies how to develop HRL algorithms that are general, in that they do not make onerous additional assumptions beyond standard RL algorithms, and efficient, in the sense that they can be used with modest numbers of interaction samples, making them suitable for real-world problems such as robotic control.
Behavior Regularized Offline Reinforcement Learning
TLDR
A general framework, behavior regularized actor critic (BRAC), is introduced to empirically evaluate recently proposed methods as well as a number of simple baselines across a variety of offline continuous control tasks.
Learning to Remember Rare Events
TLDR
A large-scale life-long memory module for use in deep learning that remembers training examples shown many thousands of steps in the past and it can successfully generalize from them and demonstrate, for the first time, life- long one-shot learning in recurrent neural networks on a large- scale machine translation task.
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections
TLDR
This work proposes an algorithm, DualDICE, that is agnostic to knowledge of the behavior policy (or policies) used to generate the dataset and improves accuracy compared to existing techniques.
MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks
TLDR
MorphNet iteratively shrinks and expands a network, shrinking via a resource-weighted sparsifying regularizer on activations and expanding via a uniform multiplicative factor on all layers, which is scalable to large networks, adaptable to specific resource constraints, and capable of increasing the network's performance.
Bridging the Gap Between Value and Policy Based Reinforcement Learning
TLDR
A new RL algorithm, Path Consistency Learning (PCL), is developed that minimizes a notion of soft consistency error along multi-step action sequences extracted from both on- and off-policy traces and significantly outperforms strong actor-critic and Q-learning baselines across several benchmarks.
DeepMDP: Learning Continuous Latent Space Models for Representation Learning
TLDR
This work introduces the concept of a DeepMDP, a parameterized latent space model that is trained via the minimization of two tractable losses: prediction of rewards and prediction of the distribution over next latent states, and shows that the optimization of these objectives guarantees the quality of the latent space as a representation of the state space.
AlgaeDICE: Policy Gradient from Arbitrary Experience
TLDR
A new formulation of max-return optimization that allows the problem to be re-expressed by an expectation over an arbitrary behavior-agnostic and off-policy data distribution and shows that, if auxiliary dual variables of the objective are optimized, then the gradient of the off-Policy objective is exactly the on-policy policy gradient, without any use of importance weighting.
Imitation Learning via Off-Policy Distribution Matching
TLDR
This work shows how the original distribution ratio estimation objective may be transformed in a principled manner to yield a completely off-policy objective and calls the resulting algorithm ValueDICE, finding that it can achieve state-of-the-art sample efficiency and performance.
...
...