• Corpus ID: 7037760

Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

  title={Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control},
  author={A. PrashanthL. and Cheng Jie and Michael C. Fu and Steven I. Marcus and Csaba Szepesvari},
  booktitle={International Conference on Machine Learning},
Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim. CPT works by distorting probabilities and is more general than the classic expected utility and coherent risk measures. We bring this idea to a risk-sensitive reinforcement learning (RL) setting and design algorithms for both estimation and control. The RL setting presents two particular challenges when CPT is applied: estimating the CPT objective requires… 

Figures from this paper

Stochastic Optimization in a Cumulative Prospect Theory Framework

This paper proposes both gradient based as well as gradient-free CPT-value optimization algorithms that are based on two well-known simulation optimization ideas: simultaneous perturbation stochastic approximation and model-based parameter search, respectively.

Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint

This article focuses on the combination of risk criteria and reinforcement learning in a constrained optimization framework, i.e., a setting where the goal to find a policy that optimizes the usual objective of infinite-horizon discounted/average cost, while ensuring that an explicit risk constraint is satisfied.

Risk-sensitive Reinforcement Learning via Distortion Risk Measures

This work proposes policy gradient algo- rithms, which maximize the DRM of the cumulative reward in an episodic Markov decision process in on-policy as well as off-policy RL settings.

On the Convergence and Optimality of Policy Gradient for Markov Coherent Risk

It is demonstrated that when the optimality gap is small, PG can learn risk-sensitive policies, and it is found that instances with large suboptimality gaps are abundant and easy to construct, outlining an important challenge for future research.

Inverse Risk-Sensitive Reinforcement Learning

A risk-sensitive reinforcement learning algorithm with convergence guarantees that makes use of coherent risk metrics and models of human decision-making which have their origins in behavioral psychology and economics is presented.

Likelihood ratio-based policy gradient methods for distorted risk measures: A non-asymptotic analysis

A variant of the policy gradient theorem that caters to the DRM objective is derived and used in conjunction with a likelihood ratio (LR) based gradient estimation scheme to propose policy gradient algorithms for optimizing DRM in both on-policy and off-policy RL settings.

Deep CPT-RL: Imparting Human-Like Risk Sensitivity to Artificial Agents

This work quantitatively compares the distribution of outcomes when optimizing full-episode expected reward, CPTvalue, and conditional value-at-risk in the CrowdSim robot navigation environment, elucidating the impacts of different objectives on the agent’s willingness to trade safety for speed.

Gradient-based inverse risk-sensitive reinforcement learning

This work proposes a gradient-based inverse reinforcement learning algorithm that minimizes a loss function defined on the observed behavior that is compatible with Markov decision processes where the agent is risksensitive.

Risk-sensitive inverse reinforcement learning via semi- and non-parametric methods

Comparisons with a risk-neutral model show that the RS-IRL framework more accurately captures observed participant behavior both qualitatively and quantitatively, especially in scenarios where catastrophic outcomes such as collisions can occur.

Reinforcement Learning Beyond Expectation

Two algorithms to enable agents to learn policies to optimize the CPT-value are developed, and it is demonstrated that behaviors of the agent learned using these algorithms are better aligned with that of a human user who might be placed in the same environment, and is significantly improved over a baseline that optimizes an expected utility.



Cumulative Prospect Theory Meets Reinforcement Learning: Estimation and Control

Using an empirical distribution over the policy space in conjunction with Kullback-Leibler (KL) divergence to the reference distribution, this work gets a global policy optimization scheme and provides theor etical convergence guarantees for all the proposed algorithms.

Actor-Critic Algorithms for Risk-Sensitive MDPs

This paper considers both discounted and average reward Markov decision processes and devise actor-critic algorithms for estimating the gradient and updating the policy parameters in the ascent direction, which establish the convergence of the algorithms to locally risk-sensitive optimal policies.

Learning Algorithms for Risk-Sensitive Control

Two learning schemes, Q-learning and the actor-critic method, are described along with their convergence analysis, which give stochastic approximation versions of the traditional iterative schemes for solving dynamic programs.

Policy Gradients for CVaR-Constrained MDPs

Two algorithms are proposed that obtain a locally risk-optimal policy by employing four tools: stochastic approximation, mini batches, policy gradients and importance sampling, and an importance sampling based variance reduction scheme is incorporated into these algorithms.

Policy Gradients for CVaR-Constrained MDPs

Two algorithms are proposed that obtain a locally risk-optimal policy by employing four tools: stochastic approximation, mini batches, policy gradients and importance sampling, and an importance sampling based variance reduction scheme is incorporated into them.

Algorithms for CVaR Optimization in MDPs

This paper first derive a formula for computing the gradient of this risk-sensitive objective function, then devise policy gradient and actor-critic algorithms that each uses a specific method to estimate this gradient and updates the policy parameters in the descent direction.

Policy Gradients with Variance Related Risk Criteria

A framework for local policy gradient style algorithms for reinforcement learning for variance related criteria for policy gradient algorithms for criteria that involve both the expected cost and the variance of the cost.

Reinforcement Learning With Function Approximation for Traffic Signal Control

A reinforcement learning (RL) algorithm with function approximation for traffic signal control that incorporates state-action features and is easily implementable in high-dimensional settings and outperforms all the other algorithms on all the road network settings that it considers.

A simultaneous perturbation stochastic approximation-based actor-critic algorithm for Markov decision processes

A two-timescale simulation-based actor-critic algorithm for solution of infinite horizon Markov decision processes with finite state and compact action spaces under the discounted cost criterion is proposed and the proof of convergence to a locally optimal policy is presented.