Corpus ID: 223953534

Safe Model-based Reinforcement Learning with Robust Cross-Entropy Method

@article{Liu2020SafeMR,
  title={Safe Model-based Reinforcement Learning with Robust Cross-Entropy Method},
  author={Zuxin Liu and Hongyi Zhou and Baiming Chen and Sicheng Zhong and Martial Hebert and Ding Zhao},
  journal={ArXiv},
  year={2020},
  volume={abs/2010.07968}
}
This paper studies the safe reinforcement learning (RL) problem without assumptions about prior knowledge of the system dynamics and the constraint function. We employ an uncertainty-aware neural network ensemble model to learn the dynamics, and we infer the unknown constraint function through indicator constraint violation signals. We use model predictive control (MPC) as the basic control framework and propose the robust cross-entropy method (RCE) to optimize the control sequence considering… Expand

Figures and Tables from this paper

Learning Off-Policy with Online Planning
TLDR
This work proposes Learning Off-Policy with Online Planning (LOOP), combining the techniques from model-based and model-free reinforcement learning algorithms, and introduces "actor-guided" trajectory optimization to mitigate the actor-divergence issue in the proposed method. Expand
Reinforcement Learning Guided by Double Replay Memory
TLDR
This article proposes a framework that accommodates doubly used experience replay memory, exploiting both important transitions and new transitions simultaneously, and proves its applicability to reinforcement learning whose action space is discrete (e.g., computer game environments). Expand
Accelerated Policy Evaluation: Learning Adversarial Environments with Adaptive Importance Sampling
TLDR
The Accelerated Policy Evaluation method is proposed, which simultaneously uncovers rare events and estimates the rare event probability in Markov decision processes and is scalable to large discrete or continuous spaces by incorporating function approximators. Expand
Planning with Learned Dynamic Model for Unsupervised Point Cloud Registration
  • Haobo Jiang, Jin Xie, Jianjun Qian, Jian Yang
  • Computer Science
  • IJCAI
  • 2021
TLDR
This paper develops a latent dynamic model of point clouds, consisting of a transformation network and evaluation network, and employs the cross-entropy method (CEM) to iteratively update the planning policy by maximizing the rewards in the point cloud registration process. Expand

References

SHOWING 1-10 OF 37 REFERENCES
Constrained Cross-Entropy Method for Safe Reinforcement Learning
We study a safe reinforcement learning problem in which the constraints are defined as the expected cost over finite-length trajectories. We propose a constrained cross-entropy-based method to solveExpand
Constrained Cross-Entropy Method for Safe Reinforcement Learning
  • Min Wen, U. Topcu
  • Computer Science
  • IEEE Transactions on Automatic Control
  • 2021
TLDR
A constrained cross-entropy-based method to transform the original constrained optimization problem into an unconstrained one with a surrogate objective that effectively learns feasible policies without assumptions on the feasibility of initial policies, even with non-Markovian objective functions and constraint functions is proposed. Expand
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
TLDR
This paper proposes a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation, which matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples. Expand
Responsive Safety in Reinforcement Learning by PID Lagrangian Methods
TLDR
This work proposes a novel Lagrange multiplier update method that utilizes derivatives of the constraint function, and introduces a new method to ease controller tuning by providing invariance to the relative numerical scales of reward and cost. Expand
Learning-Based Model Predictive Control for Safe Exploration
TLDR
This paper presents a learning-based model predictive control scheme that can provide provable high-probability safety guarantees and exploits regularity assumptions on the dynamics in terms of a Gaussian process prior to construct provably accurate confidence intervals on predicted trajectories. Expand
Safe Model-based Reinforcement Learning with Stability Guarantees
TLDR
This paper presents a learning algorithm that explicitly considers safety, defined in terms of stability guarantees, and extends control-theoretic results on Lyapunov stability verification and shows how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates. Expand
A Lyapunov-based Approach to Safe Reinforcement Learning
TLDR
This work defines and presents a method for constructing Lyapunov functions, which provide an effective way to guarantee the global safety of a behavior policy during training via a set of local, linear constraints. Expand
Safe Reinforcement Learning in Constrained Markov Decision Processes
TLDR
This paper proposes an algorithm, SNO-MDP, that explores and optimizes Markov decision processes under unknown safety constraints and takes a stepwise approach for optimizing safety and cumulative reward. Expand
Variational Inference MPC for Bayesian Model-based Reinforcement Learning
TLDR
A variational inference MPC is introduced, which reformulates various stochastic methods, including CEM, in a Bayesian fashion, and a novel instance of the framework, called probabilistic action ensembles with trajectory sampling (PaETS), which can involve multimodal uncertainties both in dynamics and optimal trajectories. Expand
Lyapunov-based Safe Policy Optimization for Continuous Control
TLDR
Safe policy optimization algorithms based on a Lyapunov approach to solve continuous action reinforcement learning problems in which it is crucial that the agent interacts with the environment only through safe policies, i.e.,~policies that do not take the agent to undesirable situations are presented. Expand
...
1
2
3
4
...