• Corpus ID: 223953534

Safe Model-based Reinforcement Learning with Robust Cross-Entropy Method

  title={Safe Model-based Reinforcement Learning with Robust Cross-Entropy Method},
  author={Zuxin Liu and Hongyi Zhou and Baiming Chen and Sicheng Zhong and Martial Hebert and Ding Zhao},
This paper studies the safe reinforcement learning (RL) problem without assumptions about prior knowledge of the system dynamics and the constraint function. We employ an uncertainty-aware neural network ensemble model to learn the dynamics, and we infer the unknown constraint function through indicator constraint violation signals. We use model predictive control (MPC) as the basic control framework and propose the robust cross-entropy method (RCE) to optimize the control sequence considering… 

Figures and Tables from this paper

Learning Off-Policy with Online Planning
This work proposes Learning Off-Policy with Online Planning (LOOP), combining the techniques from model-based and model-free reinforcement learning algorithms, and introduces "actor-guided" trajectory optimization to mitigate the actor-divergence issue in the proposed method.
Reinforcement Learning Guided by Double Replay Memory
This article proposes a framework that accommodates doubly used experience replay memory, exploiting both important transitions and new transitions simultaneously, and proves its applicability to reinforcement learning whose action space is discrete (e.g., computer game environments).
Accelerated Policy Evaluation: Learning Adversarial Environments with Adaptive Importance Sampling
The Accelerated Policy Evaluation method is proposed, which simultaneously uncovers rare events and estimates the rare event probability in Markov decision processes and is scalable to large discrete or continuous spaces by incorporating function approximators.
Planning with Learned Dynamic Model for Unsupervised Point Cloud Registration
This paper develops a latent dynamic model of point clouds, consisting of a transformation network and evaluation network, and employs the cross-entropy method (CEM) to iteratively update the planning policy by maximizing the rewards in the point cloud registration process.


Constrained Cross-Entropy Method for Safe Reinforcement Learning
We study a safe reinforcement learning problem in which the constraints are defined as the expected cost over finite-length trajectories. We propose a constrained cross-entropy-based method to solve
Constrained Cross-Entropy Method for Safe Reinforcement Learning
  • Min Wen, U. Topcu
  • Computer Science
    IEEE Transactions on Automatic Control
  • 2021
A constrained cross-entropy-based method to transform the original constrained optimization problem into an unconstrained one with a surrogate objective that effectively learns feasible policies without assumptions on the feasibility of initial policies, even with non-Markovian objective functions and constraint functions is proposed.
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
This paper proposes a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation, which matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples.
Responsive Safety in Reinforcement Learning by PID Lagrangian Methods
This work proposes a novel Lagrange multiplier update method that utilizes derivatives of the constraint function, and introduces a new method to ease controller tuning by providing invariance to the relative numerical scales of reward and cost.
Learning-Based Model Predictive Control for Safe Exploration
This paper presents a learning-based model predictive control scheme that can provide provable high-probability safety guarantees and exploits regularity assumptions on the dynamics in terms of a Gaussian process prior to construct provably accurate confidence intervals on predicted trajectories.
Safe Model-based Reinforcement Learning with Stability Guarantees
This paper presents a learning algorithm that explicitly considers safety, defined in terms of stability guarantees, and extends control-theoretic results on Lyapunov stability verification and shows how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates.
A Lyapunov-based Approach to Safe Reinforcement Learning
This work defines and presents a method for constructing Lyapunov functions, which provide an effective way to guarantee the global safety of a behavior policy during training via a set of local, linear constraints.
Safe Reinforcement Learning in Constrained Markov Decision Processes
This paper proposes an algorithm, SNO-MDP, that explores and optimizes Markov decision processes under unknown safety constraints and takes a stepwise approach for optimizing safety and cumulative reward.
Variational Inference MPC for Bayesian Model-based Reinforcement Learning
A variational inference MPC is introduced, which reformulates various stochastic methods, including CEM, in a Bayesian fashion, and a novel instance of the framework, called probabilistic action ensembles with trajectory sampling (PaETS), which can involve multimodal uncertainties both in dynamics and optimal trajectories.
Lyapunov-based Safe Policy Optimization for Continuous Control
Safe policy optimization algorithms based on a Lyapunov approach to solve continuous action reinforcement learning problems in which it is crucial that the agent interacts with the environment only through safe policies, i.e.,~policies that do not take the agent to undesirable situations are presented.