Corpus ID: 246411597

Constrained Variational Policy Optimization for Safe Reinforcement Learning

  title={Constrained Variational Policy Optimization for Safe Reinforcement Learning},
  author={Zuxin Liu and Zhepeng Cen and Vladislav Isenbaev and Wei Liu and Zhiwei Steven Wu and Bo Li and Ding Zhao},
Safe reinforcement learning (RL) aims to learn policies that satisfy certain constraints before deploying them to safety-critical applications. Previous primal-dual style approaches suffer from instability issues and lack optimality guarantees. This paper overcomes the issues from the perspective of probabilistic inference. We introduce a novel Expectation-Maximization approach to naturally incorporate constraints during the policy learning: 1) a provable optimal non-parametric variational… 
