• Corpus ID: 250451656

A Dataset Perspective on Offline Reinforcement Learning

@inproceedings{Schweighofer2021ADP,
  title={A Dataset Perspective on Offline Reinforcement Learning},
  author={Kajetan Schweighofer and Andreas Radler and Marius-Constantin Dinu and Markus Hofmarcher and Vihang Patil and Angela Bitto-Nemling and Hamid Eghbal-zadeh and Sepp Hochreiter},
  year={2021}
}
The application of Reinforcement Learning (RL) in real world environments can be expensive or risky due to sub-optimal policies during training. In Offline RL, this problem is avoided since interactions with an environment are prohibited. Policies are learned from a given dataset, which solely determines their performance. Despite this fact, how dataset characteristics influence Offline RL algorithms is still hardly investigated. The dataset characteristics are determined by the behavioral policy… 

References

SHOWING 1-10 OF 62 REFERENCES

Regularized Behavior Value Estimation

This work introduces Regularized Behavior Value Estimation (R-BVE), which estimates the value of the behavior policy during training and only performs policy improvement at deployment time, and uses a ranking regularisation term that favours actions in the dataset that lead to successful outcomes.

Conservative Q-Learning for Offline Reinforcement Learning

Conservative Q-learning (CQL) is proposed, which aims to address limitations of offline RL methods by learning a conservative Q-function such that the expected value of a policy under this Q- function lower-bounds its true value.

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

This work introduces benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL, and releases benchmark tasks and datasets with a comprehensive evaluation of existing algorithms and an evaluation protocol together with an open-source codebase.

An Optimistic Perspective on Offline Reinforcement Learning

It is demonstrated that recent off-policy deep RL algorithms, even when trained solely on this replay dataset, outperform the fully trained DQN agent and Random Ensemble Mixture (REM), a robust Q-learning algorithm that enforces optimal Bellman consistency on random convex combinations of multiple Q-value estimates is presented.

A Minimalist Approach to Offline Reinforcement Learning

It is shown that the performance of state-of-the-art RL algorithms can be matched by simply adding a behavior cloning term to the policy update of an online RL algorithm and normalizing the data, and the resulting algorithm is a simple to implement and tune baseline.

Critic Regularized Regression

This paper proposes a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR), and finds that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces -- outperforming several state-of-the-art offline RL algorithms by a significant margin on a wide range of benchmark tasks.

Distributional Reinforcement Learning with Quantile Regression

This paper examines methods of learning the value distribution instead of the value function in reinforcement learning, and presents a novel distributional reinforcement learning algorithm consistent with the theoretical formulation.

Offline Reinforcement Learning Hands-On

This work experimentally validate that diversity and high-return examples in the data are crucial to the success of offline RL and show that behavioural cloning remains a strong contender compared to its contemporaries.

Behavior Regularized Offline Reinforcement Learning

A general framework, behavior regularized actor critic (BRAC), is introduced to empirically evaluate recently proposed methods as well as a number of simple baselines across a variety of offline continuous control tasks.

Off-Policy Deep Reinforcement Learning without Exploration

This paper introduces a novel class of off-policy algorithms, batch-constrained reinforcement learning, which restricts the action space in order to force the agent towards behaving close to on-policy with respect to a subset of the given data.
...