# Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint

@article{PrashanthL2018RiskSensitiveRL, title={Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint}, author={A. PrashanthL. and Michael C. Fu}, journal={ArXiv}, year={2018}, volume={abs/1810.09126} }

The classic objective in a reinforcement learning (RL) problem is to find a policy that minimizes, in expectation, a long-run objective such as the infinite-horizon discounted or long-run average cost. In many practical applications, optimizing the expected value alone is not sufficient, and it may be necessary to include a risk measure in the optimization process, either as the objective or as a constraint. Various risk measures have been proposed in the literature, e.g., mean-variance…

## 24 Citations

Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy

- Computer Science, MathematicsArXiv
- 2020

This work makes the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria, and proposes an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.

A Convex Programming Approach to Data-Driven Risk-Averse Reinforcement Learning

- Computer Science, EngineeringArXiv
- 2021

This paper presents a model-free reinforcement learning (RL) algorithm to solve the risk-averse optimal control (RAOC) problem for discrete-time nonlinear systems and presents data-driven implementations of these algorithms based on Q-function which enables learning the optimal value without any knowledge of the system dynamics.

Accelerating Safe Reinforcement Learning with Constraint-mismatched Baseline Policies

- Computer ScienceICML
- 2021

An iterative policy optimization algorithm that alternates between maximizing expected return on the task, minimizing distance to the baseline policy, and projecting the policy onto the constraintsatisfying set is proposed.

DSAC: Distributional Soft Actor Critic for Risk-Sensitive Reinforcement Learning

- Computer Science
- 2020

A new reinforcement learning algorithm called Distributional Soft Actor Critic (DSAC), which exploits the distributional information of accumulated rewards to achieve better performance, and proposes a unified framework for risk-sensitive learning.

Primal-dual Learning for the Model-free Risk-constrained Linear Quadratic Regulator

- Computer Science, EngineeringL4DC
- 2021

This work proposes a model-free framework to learn a risk-aware controller with a focus on the linear system and proposes a primal-dual algorithm with global convergence to learn the optimal policy-multiplier pair.

A Gradient-Aware Search Algorithm for Constrained Markov Decision Processes

- Computer Science, EngineeringArXiv
- 2020

This brief proves that the optimization objective in the dual linear program of a finite CMDP is a piece-wise linear convex function (PWLC) with respect to the Lagrange penalty multipliers, and proposes a novel two-level Gradient-Aware Search (GAS) algorithm which exploits the PWLC structure to find the optimal state-value function and Lagrange penalties.

Reinforcement Learning Beyond Expectation

- Computer Science, EngineeringArXiv
- 2021

Two algorithms to enable agents to learn policies to optimize the CPT-value are developed, and it is demonstrated that behaviors of the agent learned using these algorithms are better aligned with that of a human user who might be placed in the same environment, and is significantly improved over a baseline that optimizes an expected utility.

SENTINEL: Taming Uncertainty with Ensemble-based Distributional Reinforcement Learning

- Computer ScienceArXiv
- 2021

This paper introduces a novel quantification of risk, namely composite risk, which takes into account both aleatory and epistemic risk during the learning process, and proposes to use a bootstrapping method, SENTINEL-K, for distributional RL.

Linear Quadratic Control with Risk Constraints

- Computer Science, MathematicsArXiv
- 2021

A new risk constraint is introduced, which explicitly restricts the total expected predictive variance of the state penalty by a user-prescribed level, and it is proved that the new risk-aware controller is internally stable, regardless of parameter tuning, in the special cases of i) fully-observed systems, and ii) partially-OBserved systems with Gaussian noise.

Improving Robustness via Risk Averse Distributional Reinforcement Learning

- Computer Science, MathematicsL4DC
- 2020

This work proposes a risk-aware algorithm to learn robust policies in order to bridge the gap between simulation training and real-world implementation, based on recently discovered distributional RL framework and includes CVaR risk measure in sample based distributional policy gradients (SDPG) for learning risk-averse policies.

## References

SHOWING 1-10 OF 85 REFERENCES

Risk-Sensitive Reinforcement Learning

- Mathematics, Computer ScienceMachine Learning
- 2004

This risk-sensitive reinforcement learning algorithm is based on a very different philosophy and reflects important properties of the classical exponential utility framework, but avoids its serious drawbacks for learning.

Risk-Sensitive Reinforcement Learning

- Computer Science, MathematicsNeural Computation
- 2014

A risk-sensitive Q-learning algorithm is derived, which is necessary for modeling human behavior when transition probabilities are unknown, and applied to quantify human behavior in a sequential investment task and is found to provide a significantly better fit to the behavioral data and leads to an interpretation of the subject's responses that is indeed consistent with prospect theory.

Variance-constrained actor-critic algorithms for discounted and average reward MDPs

- Computer Science, MathematicsMachine Learning
- 2016

This paper considers both discounted and average reward Markov decision processes and devise actor-critic algorithms that operate on three timescales—a TD critic on the fastest timescale, a policy gradient (actor) on the intermediate timescale), and a dual ascent for Lagrange multipliers on the slowest timescale.

Actor-Critic Algorithms for Risk-Sensitive MDPs

- Computer Science, MathematicsNIPS
- 2013

This paper considers both discounted and average reward Markov decision processes and devise actor-critic algorithms for estimating the gradient and updating the policy parameters in the ascent direction, which establish the convergence of the algorithms to locally risk-sensitive optimal policies.

Infinite-Horizon Policy-Gradient Estimation

- Computer Science, MathematicsJ. Artif. Intell. Res.
- 2001

GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies, is introduced.

Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

- Computer Science, MathematicsICML
- 2016

This work bringsulative prospect theory to a risk-sensitive reinforcement learning (RL) setting and designs algorithms for both estimation and control and provides theoretical convergence guarantees for all the proposed algorithms.

More Risk-Sensitive Markov Decision Processes

- Mathematics, Computer ScienceMath. Oper. Res.
- 2014

It turns out that under suitable recurrence conditions on the MDP for convex power utility, the minimal average cost does not depend on the parameter of the utility function and is equal to the risk-neutral average cost, in contrast to the classical risk-sensitive criterion with exponential utility.

Policy Gradient for Coherent Risk Measures

- Computer Science, MathematicsNIPS
- 2015

This work extends the policy gradient method to the whole class of coherent risk measures, which is widely accepted in finance and operations research, among other fields and presents a unified approach to risk-sensitive reinforcement learning that generalizes and extends previous results.

Learning Algorithms for Risk-Sensitive Control

- 2010

This is a survey of some reinforcement learning algorithms for risk-sensitive control on infinite horizon. Basics of the risk-sensitive control problem are recalled, notably the corresponding dynamic…

Policy Gradients with Variance Related Risk Criteria

- Computer Science, MathematicsICML
- 2012

A framework for local policy gradient style algorithms for reinforcement learning for variance related criteria for policy gradient algorithms for criteria that involve both the expected cost and the variance of the cost.