• Corpus ID: 237407042

A Workflow for Offline Model-Free Robotic Reinforcement Learning

@inproceedings{Kumar2021AWF,
  title={A Workflow for Offline Model-Free Robotic Reinforcement Learning},
  author={Aviral Kumar and Anika Singh and Stephen Tian and Chelsea Finn and Sergey Levine},
  booktitle={CoRL},
  year={2021}
}
: Offline reinforcement learning (RL) enables learning control policies by utilizing only prior experience, without any online interaction. This can allow robots to acquire generalizable skills from large and diverse datasets, without any costly or unsafe online data collection. Despite recent algorithmic advances in offline RL, applying these methods to real-world problems has proven challenging. Although offline RL methods can learn from prior data, there is no clear and well-understood process… 

Figures and Tables from this paper

The Reflective Explorer: Online Meta-Exploration from Offline Data in Realistic Robotic Tasks
TLDR
This work develops the first offline model-based meta-RL algorithm that operates from images in tasks with sparse rewards, and shows that this method completely solves a realistic meta-learning task involving robot manipulation, while naive combinations of previous approaches fail.
SHOULD I RUN OFFLINE REINFORCEMENT LEARNING
TLDR
This work characterize the properties of environments that allow offline RL methods to perform better than BC methods even when only provided with expert data, and shows that policies trained on suboptimal data that is sufficiently noisy can attain better performance than even BC algorithms with expertData, especially on long-horizon problems.
When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?
TLDR
This work characterize the properties of environments that allow offline RL methods to perform better than BC methods, even when only provided with expert data, and shows that policies trained on sufficiently noisy suboptimal data can attain better performance than even BC algorithms with expertData, especially on long-horizon problems.
A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems
TLDR
This work proposes a unifying taxonomy to classify offline RL methods, and provides a comprehensive review of the latest algorithmic breakthroughs in the field, and a review of existing benchmarks’ properties and shortcomings.
Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning
TLDR
It is suggested that data generation is as important as algorithmic advances for offline RL and hence requires careful consideration from the committee, and that exploratory data allows vanilla off-policy RL algorithms to outperform or match state-of-the-art ofﵡine RL algorithms on downstream tasks.
Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning
TLDR
A hierarchical planning framework, consisting of a low- level goal-conditioned RL policy and a high-level goal planner, and a Conditional Variational Autoencoder to sample meaningful high-dimensional sub-goal candidates and to solve the high- level long-term strategy optimization problem.
Bayesian Imitation Learning for End-to-End Mobile Manipulation
TLDR
This work investigates and demonstrates benefits of a Bayesian approach to imitation learning from multiple sensor inputs, as applied to the task of opening office doors with a mobile manipulator, and shows that using the Variational Information Bottleneck to regularize convolutional neural networks improves generalization to heldout domains and reduces the sim-to-real gap in a sensor-agnostic manner.
RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning
TLDR
RamBO is presented, a novel approach to model-based offline RL that addresses the problem as a two-player zero sum game against an adversarial environment model, resulting in a PAC performance guarantee and a pessimistic value function which lower bounds the value function in the true environment.
Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL
TLDR
A theoretical framework is suggested that allows us to incorporate behavior-cloned models into value-based offline RL methods, enjoying the strength of both explicit behavior cloning and value learning and proposes a practical method utilizing a score-based generative model for behavior cloning.
Vision-Based Manipulators Need to Also See from Their Hands
TLDR
This work systematically analyzes the benefits of putting cameras in the hands of robots and provides simple and broadly applicable insights for improving end-to-end learned vision-based robotic manipulation.
...
...

References

SHOWING 1-10 OF 78 REFERENCES
Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills
TLDR
This work proposes the objective of learning a functional understanding of the environment by learning to reach any goal state in a given dataset by employing goal-conditioned Q-learning with hindsight relabeling and develops several techniques that enable training in a particularly challenging offline setting.
Accelerating Online Reinforcement Learning with Offline Datasets
TLDR
A novel algorithm is proposed that combines sample-efficient dynamic programming with maximum likelihood policy updates, providing a simple and effective framework that is able to leverage large amounts of offline data and then quickly perform online fine-tuning of reinforcement learning policies.
A Minimalist Approach to Offline Reinforcement Learning
TLDR
It is shown that the performance of state-of-the-art RL algorithms can be matched by simply adding a behavior cloning term to the policy update of an online RL algorithm and normalizing the data, and the resulting algorithm is a simple to implement and tune baseline.
MOReL : Model-Based Offline Reinforcement Learning
TLDR
Theoretically, it is shown that MOReL is minimax optimal (up to log factors) for offline RL, and through experiments, it matches or exceeds state-of-the-art results in widely studied offline RL benchmarks.
COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning
TLDR
It is shown that even when the prior data does not actually succeed at solving the new task, it can still be utilized for learning a better policy, by providing the agent with a broader understanding of the mechanics of its environment.
MOPO: Model-based Offline Policy Optimization
TLDR
A new model-based offline RL algorithm is proposed that applies the variance of a Lipschitz-regularized model as a penalty to the reward function, and it is found that this algorithm outperforms both standard model- based RL methods and existing state-of-the-art model-free offline RL approaches on existing offline RL benchmarks, as well as two challenging continuous control tasks.
Conservative Q-Learning for Offline Reinforcement Learning
TLDR
Conservative Q-learning (CQL) is proposed, which aims to address limitations of offline RL methods by learning a conservative Q-function such that the expected value of a policy under this Q- function lower-bounds its true value.
Offline Reinforcement Learning from Images with Latent Space Models
TLDR
This work proposes to learn a latent-state dynamics model, and represent the uncertainty in the latent space of the model predictions, and significantly outperforms previous offline model-free RL methods as well as state-of-the-art online visual model-based RL methods.
Continuous Doubly Constrained Batch Reinforcement Learning
TLDR
An algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment, which compares favorably to state-of-the-art methods, regardless of how the offline data were collected.
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
TLDR
This work develops a novel class of off-policy batch RL algorithms, able to effectively learn offline, without exploring, from a fixed batch of human interaction data, using models pre-trained on data as a strong prior, and uses KL-control to penalize divergence from this prior during RL training.
...
...