A Dataset Perspective on Offline Reinforcement Learning
@inproceedings{Schweighofer2021ADP, title={A Dataset Perspective on Offline Reinforcement Learning}, author={Kajetan Schweighofer and Andreas Radler and Marius-Constantin Dinu and Markus Hofmarcher and Vihang Patil and Angela Bitto-Nemling and Hamid Eghbal-zadeh and Sepp Hochreiter}, year={2021} }
The application of Reinforcement Learning (RL) in real world environments can be expensive or risky due to sub-optimal policies during training. In Offline RL, this problem is avoided since interactions with an environment are prohibited. Policies are learned from a given dataset, which solely determines their performance. Despite this fact, how dataset characteristics influence Offline RL algorithms is still hardly investigated. The dataset characteristics are determined by the behavioral policy…
Figures and Tables from this paper
figure 1 figure 2 figure 3 figure 4 figure 5 table A.1 figure A.10 table A.10 figure A.11 table A.11 figure A.12 figure A.13 figure A.14 figure A.15 figure A.16 figure A.17 figure A.18 figure A.19 figure A.2 table A.2 figure A.20 figure A.21 figure A.3 table A.3 figure A.4 table A.4 figure A.5 table A.5 figure A.6 table A.6 figure A.7 table A.7 figure A.8 table A.8 figure A.9 table A.9
References
SHOWING 1-10 OF 62 REFERENCES
Regularized Behavior Value Estimation
- Computer ScienceArXiv
- 2021
This work introduces Regularized Behavior Value Estimation (R-BVE), which estimates the value of the behavior policy during training and only performs policy improvement at deployment time, and uses a ranking regularisation term that favours actions in the dataset that lead to successful outcomes.
Conservative Q-Learning for Offline Reinforcement Learning
- Computer ScienceNeurIPS
- 2020
Conservative Q-learning (CQL) is proposed, which aims to address limitations of offline RL methods by learning a conservative Q-function such that the expected value of a policy under this Q- function lower-bounds its true value.
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
- Computer ScienceArXiv
- 2020
This work introduces benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL, and releases benchmark tasks and datasets with a comprehensive evaluation of existing algorithms and an evaluation protocol together with an open-source codebase.
An Optimistic Perspective on Offline Reinforcement Learning
- Computer ScienceICML
- 2020
It is demonstrated that recent off-policy deep RL algorithms, even when trained solely on this replay dataset, outperform the fully trained DQN agent and Random Ensemble Mixture (REM), a robust Q-learning algorithm that enforces optimal Bellman consistency on random convex combinations of multiple Q-value estimates is presented.
A Minimalist Approach to Offline Reinforcement Learning
- Computer ScienceNeurIPS
- 2021
It is shown that the performance of state-of-the-art RL algorithms can be matched by simply adding a behavior cloning term to the policy update of an online RL algorithm and normalizing the data, and the resulting algorithm is a simple to implement and tune baseline.
Critic Regularized Regression
- Computer ScienceNeurIPS
- 2020
This paper proposes a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR), and finds that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces -- outperforming several state-of-the-art offline RL algorithms by a significant margin on a wide range of benchmark tasks.
Distributional Reinforcement Learning with Quantile Regression
- Computer ScienceAAAI
- 2018
This paper examines methods of learning the value distribution instead of the value function in reinforcement learning, and presents a novel distributional reinforcement learning algorithm consistent with the theoretical formulation.
Offline Reinforcement Learning Hands-On
- Computer ScienceArXiv
- 2020
This work experimentally validate that diversity and high-return examples in the data are crucial to the success of offline RL and show that behavioural cloning remains a strong contender compared to its contemporaries.
Behavior Regularized Offline Reinforcement Learning
- Computer ScienceArXiv
- 2019
A general framework, behavior regularized actor critic (BRAC), is introduced to empirically evaluate recently proposed methods as well as a number of simple baselines across a variety of offline continuous control tasks.
Off-Policy Deep Reinforcement Learning without Exploration
- Computer ScienceICML
- 2019
This paper introduces a novel class of off-policy algorithms, batch-constrained reinforcement learning, which restricts the action space in order to force the agent towards behaving close to on-policy with respect to a subset of the given data.