• Corpus ID: 244478308

UMBRELLA: Uncertainty-Aware Model-Based Offline Reinforcement Learning Leveraging Planning

  title={UMBRELLA: Uncertainty-Aware Model-Based Offline Reinforcement Learning Leveraging Planning},
  author={Christopher P. Diehl and Timo Sievernich and Martin Kr{\"u}ger and Frank Hoffmann and Torsten Bertram},
Offline reinforcement learning (RL) provides a framework for learning decisionmaking from offline data and therefore constitutes a promising approach for realworld applications as automated driving. Self-driving vehicles (SDV) learn a policy, which potentially even outperforms the behavior in the sub-optimal data set. Especially in safety-critical applications as automated driving, explainability and transferability are key to success. This motivates the use of model-based offline RL approaches… 


COMBO: Conservative Offline Model-Based Policy Optimization
A new model-based offline RL algorithm, COMBO, is developed that trains a value function using both the offline dataset and data generated using rollouts under the model while also additionally regularizing the value function on out-of-support state-action tuples generated via model rollouts, without requiring explicit uncertainty estimation.
Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control
A plan online and learn offline (POLO) framework for the setting where an agent, with an internal model, needs to continually act and learn in the world and how trajectory optimization can be used to perform temporally coordinated exploration in conjunction with estimating uncertainty in value function approximation.
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections
This work proposes an algorithm, DualDICE, that is agnostic to knowledge of the behavior policy (or policies) used to generate the dataset and improves accuracy compared to existing techniques.
Hyperparameter Selection for Offline Reinforcement Learning
This work focuses on offline hyperparameter selection, i.e. methods for choosing the best policy from a set of many policies trained using different hyperparameters, given only logged data, and shows that offline RL algorithms are not robust tohyperparameter choices.
Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic
This work proposes to train a policy by unrolling a learned model of the environment dynamics over multiple time steps while explicitly penalizing two costs: the original cost the policy seeks to optimize, and an uncertainty cost which represents its divergence from the states it is trained on.
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
This paper proposes a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting and demonstrates that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.
Exploring the Limitations of Behavior Cloning for Autonomous Driving
It is shown that behavior cloning leads to state-of-the-art results, executing complex lateral and longitudinal maneuvers, even in unseen environments, without being explicitly programmed to do so, and some limitations of the behavior cloning approach are confirmed.
BADGR: An Autonomous Self-Supervised Learning-Based Navigation System
The reinforcement learning approach, which the authors call BADGR, is an end-to-end learning-based mobile robot navigation system that can be trained with autonomously-labeled off-policy data gathered in real-world environments, without any simulation or human supervision.
End-to-End Driving Via Conditional Imitation Learning
This work evaluates different architectures for conditional imitation learning in vision-based driving and conducts experiments in realistic three-dimensional simulations of urban driving and on a 1/5 scale robotic truck that is trained to drive in a residential area.
An Algorithmic Perspective on Imitation Learning
This work provides an introduction to imitation learning, dividing imitation learning into directly replicating desired behavior and learning the hidden objectives of the desired behavior from demonstrations (called inverse optimal control or inverse reinforcement learning [Russell, 1998]).