Towards Painless Policy Optimization for Constrained MDPs
@inproceedings{Jain2022TowardsPP, title={Towards Painless Policy Optimization for Constrained MDPs}, author={Arushi Jain and Sharan Vaswani and Reza Babanezhad and Csaba Szepesvari and Doina Precup}, booktitle={Conference on Uncertainty in Artificial Intelligence}, year={2022} }
We study policy optimization in an infinite horizon, γ -discounted constrained Markov decision process (CMDP). Our objective is to return a policy that achieves large expected reward with a small constraint violation. We consider the online setting with linear function approximation and assume global access to the corresponding features. We propose a generic primal-dual framework that allows us to bound the reward sub-optimality and constraint violation for arbitrary algorithms in terms of…
Figures and Tables from this paper
2 Citations
Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning
- Computer ScienceArXiv
- 2022
An Anchor-changing Regularized Natural Policy Gradient (ARNPG) framework is proposed, which can systematically incorporate ideas from well-performing first-order methods into the design of policy optimization algorithms for multi-objective MDP problems.
Provable Reset-free Reinforcement Learning by No-Regret Reduction
- Computer ScienceArXiv
- 2023
This work proposes a generic no-regret reduction to systematically design reset-free RL algorithms, and designs an instantiation for linear Markov decision processes, which is the first provably correct reset- free RL algorithm to the authors' knowledge.
References
SHOWING 1-10 OF 42 REFERENCES
Provably Efficient Safe Exploration via Primal-Dual Policy Optimization
- Computer ScienceAISTATS
- 2021
An Optimistic-Dual Proximal Policy-OPDOP algorithm where the value function is estimated by combining the least-squares policy evaluation and an additional bonus term for safe exploration, which is the first provably efficient policy optimization algorithm for CMDPs with safe exploration.
Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes
- Computer Science, MathematicsNeurIPS
- 2020
This work is the first to establish non-asymptotic convergence guarantees of policybased primal-dual methods for solving infinite-horizon discounted CMDPs, and it is shown that two samplebased NPG-PD algorithms inherit such non- ATM convergence properties and provide finite-sample complexity guarantees.
CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee
- Computer ScienceICML
- 2021
The empirical results demonstrate that CRPO can out-perform the existing primal-dual baseline algorithms significantly and achieve an O (1 / √ T ) convergence rate to the global optimal policy in the constrained policy set and an error bound on constraint satisfaction.
IPO: Interior-point Policy Optimization under Constraints
- Computer ScienceAAAI
- 2020
A novel first-order policy optimization method is proposed, Interior-point Policy Optimization (IPO), which augments the objective with logarithmic barrier functions, inspired by the interior-point method, which can handle general types of cumulative multi-constraint settings.
Constrained Policy Optimization
- Computer ScienceICML
- 2017
Constrained Policy Optimization (CPO) is proposed, the first general-purpose policy search algorithm for constrained reinforcement learning with guarantees for near-constraint satisfaction at each iteration, and allows us to train neural network policies for high-dimensional control while making guarantees about policy behavior all throughout training.
Accelerated Primal-Dual Policy Optimization for Safe Reinforcement Learning
- Computer ScienceArXiv
- 2018
This paper proposes a policy search method for CMDPs called Accelerated Primal-Dual Optimization (APDO), which incorporates an off-policy trained dual variable in the dual update procedure while updating the policy in primal space with on-policy likelihood ratio gradient.
POLITEX: Regret Bounds for Policy Iteration using Expert Prediction
- Computer ScienceICML
- 2019
POLicy ITeration with EXpert advice is presented, a variant of policy iteration where each policy is a Boltzmann distribution over the sum of action-value function estimates of the previous policies, and the viability of POLITEX beyond linear function approximation is confirmed.
Reward Constrained Policy Optimization
- Computer ScienceICLR
- 2019
This work presents a novel multi-timescale approach for constrained policy optimization, called `Reward Constrained Policy Optimization' (RCPO), which uses an alternative penalty signal to guide the policy towards a constraint satisfying one.
Chance-constrained dynamic programming with application to risk-aware robotic space exploration
- Computer ScienceAuton. Robots
- 2015
This paper presents a novel algorithmic approach to reformulate a joint chance constraint as a constraint on the expectation of a summation of indicator random variables, which can be incorporated into the cost function by considering a dual formulation of the optimization problem.
A Lyapunov-based Approach to Safe Reinforcement Learning
- Computer ScienceNeurIPS
- 2018
This work defines and presents a method for constructing Lyapunov functions, which provide an effective way to guarantee the global safety of a behavior policy during training via a set of local, linear constraints.