Corpus ID: 53047274

Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint

@article{PrashanthL2018RiskSensitiveRL,
  title={Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint},
  author={A. PrashanthL. and Michael C. Fu},
  journal={ArXiv},
  year={2018},
  volume={abs/1810.09126}
}
The classic objective in a reinforcement learning (RL) problem is to find a policy that minimizes, in expectation, a long-run objective such as the infinite-horizon discounted or long-run average cost. In many practical applications, optimizing the expected value alone is not sufficient, and it may be necessary to include a risk measure in the optimization process, either as the objective or as a constraint. Various risk measures have been proposed in the literature, e.g., mean-variance… Expand
Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy
TLDR
This work makes the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria, and proposes an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable. Expand
A Convex Programming Approach to Data-Driven Risk-Averse Reinforcement Learning
TLDR
This paper presents a model-free reinforcement learning (RL) algorithm to solve the risk-averse optimal control (RAOC) problem for discrete-time nonlinear systems and presents data-driven implementations of these algorithms based on Q-function which enables learning the optimal value without any knowledge of the system dynamics. Expand
Accelerating Safe Reinforcement Learning with Constraint-mismatched Baseline Policies
TLDR
An iterative policy optimization algorithm that alternates between maximizing expected return on the task, minimizing distance to the baseline policy, and projecting the policy onto the constraintsatisfying set is proposed. Expand
DSAC: Distributional Soft Actor Critic for Risk-Sensitive Reinforcement Learning
TLDR
A new reinforcement learning algorithm called Distributional Soft Actor Critic (DSAC), which exploits the distributional information of accumulated rewards to achieve better performance, and proposes a unified framework for risk-sensitive learning. Expand
Primal-dual Learning for the Model-free Risk-constrained Linear Quadratic Regulator
TLDR
This work proposes a model-free framework to learn a risk-aware controller with a focus on the linear system and proposes a primal-dual algorithm with global convergence to learn the optimal policy-multiplier pair. Expand
A Gradient-Aware Search Algorithm for Constrained Markov Decision Processes
TLDR
This brief proves that the optimization objective in the dual linear program of a finite CMDP is a piece-wise linear convex function (PWLC) with respect to the Lagrange penalty multipliers, and proposes a novel two-level Gradient-Aware Search (GAS) algorithm which exploits the PWLC structure to find the optimal state-value function and Lagrange penalties. Expand
Reinforcement Learning Beyond Expectation
TLDR
Two algorithms to enable agents to learn policies to optimize the CPT-value are developed, and it is demonstrated that behaviors of the agent learned using these algorithms are better aligned with that of a human user who might be placed in the same environment, and is significantly improved over a baseline that optimizes an expected utility. Expand
SENTINEL: Taming Uncertainty with Ensemble-based Distributional Reinforcement Learning
TLDR
This paper introduces a novel quantification of risk, namely composite risk, which takes into account both aleatory and epistemic risk during the learning process, and proposes to use a bootstrapping method, SENTINEL-K, for distributional RL. Expand
Improving Robustness via Risk Averse Distributional Reinforcement Learning
TLDR
This work proposes a risk-aware algorithm to learn robust policies in order to bridge the gap between simulation training and real-world implementation, based on recently discovered distributional RL framework and includes CVaR risk measure in sample based distributional policy gradients (SDPG) for learning risk-averse policies. Expand
Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret
TLDR
The results demonstrate that incorporating risk awareness into reinforcement learning necessitates an exponential cost in β and H, which quantifies the fundamental tradeoff between risk sensitivity and sample efficiency. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 85 REFERENCES
Risk-Sensitive Reinforcement Learning
TLDR
This risk-sensitive reinforcement learning algorithm is based on a very different philosophy and reflects important properties of the classical exponential utility framework, but avoids its serious drawbacks for learning. Expand
Risk-Sensitive Reinforcement Learning
TLDR
A risk-sensitive Q-learning algorithm is derived, which is necessary for modeling human behavior when transition probabilities are unknown, and applied to quantify human behavior in a sequential investment task and is found to provide a significantly better fit to the behavioral data and leads to an interpretation of the subject's responses that is indeed consistent with prospect theory. Expand
Variance-constrained actor-critic algorithms for discounted and average reward MDPs
TLDR
This paper considers both discounted and average reward Markov decision processes and devise actor-critic algorithms that operate on three timescales—a TD critic on the fastest timescale, a policy gradient (actor) on the intermediate timescale), and a dual ascent for Lagrange multipliers on the slowest timescale. Expand
Actor-Critic Algorithms for Risk-Sensitive MDPs
TLDR
This paper considers both discounted and average reward Markov decision processes and devise actor-critic algorithms for estimating the gradient and updating the policy parameters in the ascent direction, which establish the convergence of the algorithms to locally risk-sensitive optimal policies. Expand
Infinite-Horizon Policy-Gradient Estimation
TLDR
GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies, is introduced. Expand
Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control
TLDR
This work bringsulative prospect theory to a risk-sensitive reinforcement learning (RL) setting and designs algorithms for both estimation and control and provides theoretical convergence guarantees for all the proposed algorithms. Expand
More Risk-Sensitive Markov Decision Processes
TLDR
It turns out that under suitable recurrence conditions on the MDP for convex power utility, the minimal average cost does not depend on the parameter of the utility function and is equal to the risk-neutral average cost, in contrast to the classical risk-sensitive criterion with exponential utility. Expand
Policy Gradient for Coherent Risk Measures
TLDR
This work extends the policy gradient method to the whole class of coherent risk measures, which is widely accepted in finance and operations research, among other fields and presents a unified approach to risk-sensitive reinforcement learning that generalizes and extends previous results. Expand
Learning Algorithms for Risk-Sensitive Control
This is a survey of some reinforcement learning algorithms for risk-sensitive control on infinite horizon. Basics of the risk-sensitive control problem are recalled, notably the corresponding dynamicExpand
Policy Gradients with Variance Related Risk Criteria
TLDR
A framework for local policy gradient style algorithms for reinforcement learning for variance related criteria for policy gradient algorithms for criteria that involve both the expected cost and the variance of the cost. Expand
...
1
2
3
4
5
...