• Corpus ID: 237532443

Conservative Data Sharing for Multi-Task Offline Reinforcement Learning

  title={Conservative Data Sharing for Multi-Task Offline Reinforcement Learning},
  author={Tianhe Yu and Aviral Kumar and Yevgen Chebotar and Karol Hausman and Sergey Levine and Chelsea Finn},
Offline reinforcement learning (RL) algorithms have shown promising results in domains where abundant pre-collected data is available. However, prior methods focus on solving individual problems from scratch with an offline dataset without considering how an offline RL agent can acquire multiple skills. We argue that a natural use case of offline RL is in settings where we can pool large amounts of data collected in various scenarios for solving different tasks, and utilize all of this data to… 
1 Citations

Figures and Tables from this paper

CALVIN: A Benchmark for Language-conditioned Policy Learning for Long-horizon Robot Manipulation Tasks
CALVIN (Composing Actions from Language and Vision), an open-source simulated benchmark to learn longhorizon language-conditioned tasks, is presented, suggesting that there is significant room for developing innovative agents that learn to relate human language to their world models with this benchmark.


RL Unplugged: Benchmarks for Offline Reinforcement Learning
This paper proposes a benchmark called RL Unplugged to evaluate and compare offline RL methods, a suite of benchmarks that will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.
Representation Balancing Offline Model-based Reinforcement Learning
This paper addresses the curse of horizon exhibited by RepBM, rejecting most of the pre-collected data in long-term tasks, and presents a new objective for model learning motivated by recent advances in the estimation of stationary distribution corrections, which effectively overcomes the aforementioned limitation of RepBM.
Representation Matters: Offline Pretraining for Sequential Decision Making
Through a variety of experiments utilizing standard offline RL datasets, it is found that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms that otherwise yield mediocre performance on their own.
Distral: Robust multitask reinforcement learning
This work proposes a new approach for joint training of multiple tasks, which it refers to as Distral (Distill & transfer learning), and shows that the proposed learning process is more robust and more stable---attributes that are critical in deep reinforcement learning.
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
This work develops a novel class of off-policy batch RL algorithms, able to effectively learn offline, without exploring, from a fixed batch of human interaction data, using models pre-trained on data as a strong prior, and uses KL-control to penalize divergence from this prior during RL training.
Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization
A novel model-based algorithm, Behavior-Regularized Model-ENsemble (BREMEN), that can effectively optimize a policy offline using 10-20 times fewer data than prior works, and is able to achieve impressive deployment efficiency while maintaining the same or better sample efficiency.
Behavior Regularized Offline Reinforcement Learning
A general framework, behavior regularized actor critic (BRAC), is introduced to empirically evaluate recently proposed methods as well as a number of simple baselines across a variety of offline continuous control tasks.
Sharing Knowledge in Multi-Task Deep Reinforcement Learning
This work studies the benefit of sharing representations among tasks to enable the effective use of deep neural networks in Multi-Task Reinforcement Learning, and extends the well-known finite-time bounds of Approximate Value-Iteration to the multi-task setting.
Generalized Hindsight for Reinforcement Learning
Compared to standard relabeling techniques, Generalized Hindsight provides a substantially more efficient reuse of samples, which is empirically demonstrated on a suite of multi-task navigation and manipulation tasks.
COMBO: Conservative Offline Model-Based Policy Optimization
A new model-based offline RL algorithm, COMBO, is developed that trains a value function using both the offline dataset and data generated using rollouts under the model while also additionally regularizing the value function on out-of-support state-action tuples generated via model rollouts, without requiring explicit uncertainty estimation.