A Behavior Regularized Implicit Policy for Offline Reinforcement Learning
@inproceedings{Yang2022ABR, title={A Behavior Regularized Implicit Policy for Offline Reinforcement Learning}, author={Shentao Yang and Zhendong Wang and Huangjie Zheng and Yihao Feng and Mingyuan Zhou}, year={2022} }
Offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment. The lack of environmental interactions makes the policy training vulnerable to state-action pairs far from the training dataset and prone to missing rewarding actions. For training more effective agents, we propose a framework that supports learning a flexible yet well-regularized fully-implicit policy. We further propose a simple modification to the classical policy-matching…
Figures and Tables from this paper
References
SHOWING 1-10 OF 80 REFERENCES
Conservative Q-Learning for Offline Reinforcement Learning
- Computer ScienceNeurIPS
- 2020
Conservative Q-learning (CQL) is proposed, which aims to address limitations of offline RL methods by learning a conservative Q-function such that the expected value of a policy under this Q- function lower-bounds its true value.
Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
- Computer ScienceNeurIPS
- 2019
A practical algorithm, bootstrapping error accumulation reduction (BEAR), is proposed and it is demonstrated that BEAR is able to learn robustly from different off-policy distributions, including random and suboptimal demonstrations, on a range of continuous control tasks.
Addressing Function Approximation Error in Actor-Critic Methods
- Computer ScienceICML
- 2018
This paper builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation, and draws the connection between target networks and overestimation bias.
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
- Computer ScienceICLR
- 2016
This work introduces a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrates that they are a strong candidate for unsupervised learning.
Offline Reinforcement Learning with Implicit Q-Learning
- Computer ScienceICLR
- 2022
This work proposes a new offline RL method that never needs to evaluate actions outside of the dataset, but still enables the learned policy to improve substantially over the best behavior in the data through generalization, known as implicit Q-learning (IQL).
Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble
- Computer ScienceNeurIPS
- 2021
This work proposes an uncertainty-based offline RL method that takes into account the confidence of the Q-value prediction and does not require any estimation or sampling of the data distribution, and shows that the clipped Q-learning, a technique widely used in online RL, can be leveraged to successfully penalize OOD data points with high prediction uncertainties.
OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation
- Computer ScienceICML
- 2021
This paper presents an offline RL algorithm, OptiDICE, that directly estimates the stationary distribution corrections of the optimal policy and does not rely on policy-gradients, unlike previous offline RL algorithms.
A Minimalist Approach to Offline Reinforcement Learning
- Computer ScienceNeurIPS
- 2021
It is shown that the performance of state-of-the-art RL algorithms can be matched by simply adding a behavior cloning term to the policy update of an online RL algorithm and normalizing the data, and the resulting algorithm is a simple to implement and tune baseline.
Offline Reinforcement Learning with Fisher Divergence Critic Regularization
- Computer ScienceICML
- 2021
This work parameterizes the critic as the logbehavior-policy, which generated the offline data, plus a state-action value offset term, which can be learned using a neural network, and term the resulting algorithm Fisher-BRC (Behavior Regularized Critic), which achieves both improved performance and faster convergence over existing state-of-the-art methods.
Implicit Distributional Reinforcement Learning
- Computer ScienceNeurIPS
- 2020
An implicit distributional actor critic that consists of a distributional critic, built on two deep generator networks, and a semi-implicit actor (SIA), powered by a flexible policy distribution to improve the sample efficiency of policy-gradient based reinforcement learning algorithms.