Corpus ID: 237532443

Conservative Data Sharing for Multi-Task Offline Reinforcement Learning

@article{Yu2021ConservativeDS,
  title={Conservative Data Sharing for Multi-Task Offline Reinforcement Learning},
  author={Tianhe Yu and Aviral Kumar and Yevgen Chebotar and Karol Hausman and Sergey Levine and Chelsea Finn},
  journal={ArXiv},
  year={2021},
  volume={abs/2109.08128}
}
Offline reinforcement learning (RL) algorithms have shown promising results in domains where abundant pre-collected data is available. However, prior methods focus on solving individual problems from scratch with an offline dataset without considering how an offline RL agent can acquire multiple skills. We argue that a natural use case of offline RL is in settings where we can pool large amounts of data collected in various scenarios for solving different tasks, and utilize all of this data to… Expand
1 Citations

Figures and Tables from this paper

CALVIN: A Benchmark for Language-conditioned Policy Learning for Long-horizon Robot Manipulation Tasks
General-purpose robots coexisting with humans in their environment must learn to relate human language to their perceptions and actions to be useful in a range of daily tasks. Moreover, they need toExpand

References

SHOWING 1-10 OF 94 REFERENCES
RL Unplugged: Benchmarks for Offline Reinforcement Learning
TLDR
This paper proposes a benchmark called RL Unplugged to evaluate and compare offline RL methods, a suite of benchmarks that will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community. Expand
Representation Balancing Offline Model-based Reinforcement Learning
TLDR
This paper addresses the curse of horizon exhibited by RepBM, rejecting most of the pre-collected data in long-term tasks, and presents a new objective for model learning motivated by recent advances in the estimation of stationary distribution corrections, which effectively overcomes the aforementioned limitation of RepBM. Expand
Representation Matters: Offline Pretraining for Sequential Decision Making
TLDR
Through a variety of experiments utilizing standard offline RL datasets, it is found that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms that otherwise yield mediocre performance on their own. Expand
Distral: Robust multitask reinforcement learning
TLDR
This work proposes a new approach for joint training of multiple tasks, which it refers to as Distral (Distill & transfer learning), and shows that the proposed learning process is more robust and more stable---attributes that are critical in deep reinforcement learning. Expand
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
TLDR
This work develops a novel class of off-policy batch RL algorithms, able to effectively learn offline, without exploring, from a fixed batch of human interaction data, using models pre-trained on data as a strong prior, and uses KL-control to penalize divergence from this prior during RL training. Expand
Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization
TLDR
A novel model-based algorithm, Behavior-Regularized Model-ENsemble (BREMEN), that can effectively optimize a policy offline using 10-20 times fewer data than prior works, and is able to achieve impressive deployment efficiency while maintaining the same or better sample efficiency. Expand
Behavior Regularized Offline Reinforcement Learning
TLDR
A general framework, behavior regularized actor critic (BRAC), is introduced to empirically evaluate recently proposed methods as well as a number of simple baselines across a variety of offline continuous control tasks. Expand
Sharing Knowledge in Multi-Task Deep Reinforcement Learning
TLDR
This work studies the benefit of sharing representations among tasks to enable the effective use of deep neural networks in Multi-Task Reinforcement Learning, and extends the well-known finite-time bounds of Approximate Value-Iteration to the multi-task setting. Expand
Generalized Hindsight for Reinforcement Learning
TLDR
Compared to standard relabeling techniques, Generalized Hindsight provides a substantially more efficient reuse of samples, which is empirically demonstrated on a suite of multi-task navigation and manipulation tasks. Expand
COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning
TLDR
It is shown that even when the prior data does not actually succeed at solving the new task, it can still be utilized for learning a better policy, by providing the agent with a broader understanding of the mechanics of its environment. Expand
...
1
2
3
4
5
...