Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control
@inproceedings{PrashanthL2015CumulativePT, title={Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control}, author={A. PrashanthL. and Cheng Jie and Michael C. Fu and Steven I. Marcus and Csaba Szepesvari}, booktitle={International Conference on Machine Learning}, year={2015} }
Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim. CPT works by distorting probabilities and is more general than the classic expected utility and coherent risk measures. We bring this idea to a risk-sensitive reinforcement learning (RL) setting and design algorithms for both estimation and control. The RL setting presents two particular challenges when CPT is applied: estimating the CPT objective requires…
Figures from this paper
70 Citations
Stochastic Optimization in a Cumulative Prospect Theory Framework
- Computer ScienceIEEE Transactions on Automatic Control
- 2018
This paper proposes both gradient based as well as gradient-free CPT-value optimization algorithms that are based on two well-known simulation optimization ideas: simultaneous perturbation stochastic approximation and model-based parameter search, respectively.
Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint
- EconomicsArXiv
- 2018
This article focuses on the combination of risk criteria and reinforcement learning in a constrained optimization framework, i.e., a setting where the goal to find a policy that optimizes the usual objective of infinite-horizon discounted/average cost, while ensuring that an explicit risk constraint is satisfied.
Risk-sensitive Reinforcement Learning via Distortion Risk Measures
- Computer Science
- 2022
This work proposes policy gradient algo- rithms, which maximize the DRM of the cumulative reward in an episodic Markov decision process in on-policy as well as off-policy RL settings.
On the Convergence and Optimality of Policy Gradient for Markov Coherent Risk
- Computer ScienceArXiv
- 2021
It is demonstrated that when the optimality gap is small, PG can learn risk-sensitive policies, and it is found that instances with large suboptimality gaps are abundant and easy to construct, outlining an important challenge for future research.
Inverse Risk-Sensitive Reinforcement Learning
- Computer ScienceIEEE Transactions on Automatic Control
- 2020
A risk-sensitive reinforcement learning algorithm with convergence guarantees that makes use of coherent risk metrics and models of human decision-making which have their origins in behavioral psychology and economics is presented.
Likelihood ratio-based policy gradient methods for distorted risk measures: A non-asymptotic analysis
- Computer ScienceArXiv
- 2021
A variant of the policy gradient theorem that caters to the DRM objective is derived and used in conjunction with a likelihood ratio (LR) based gradient estimation scheme to propose policy gradient algorithms for optimizing DRM in both on-policy and off-policy RL settings.
Deep CPT-RL: Imparting Human-Like Risk Sensitivity to Artificial Agents
- EconomicsSafeAI@AAAI
- 2021
This work quantitatively compares the distribution of outcomes when optimizing full-episode expected reward, CPTvalue, and conditional value-at-risk in the CrowdSim robot navigation environment, elucidating the impacts of different objectives on the agent’s willingness to trade safety for speed.
Gradient-based inverse risk-sensitive reinforcement learning
- Computer Science2017 IEEE 56th Annual Conference on Decision and Control (CDC)
- 2017
This work proposes a gradient-based inverse reinforcement learning algorithm that minimizes a loss function defined on the observed behavior that is compatible with Markov decision processes where the agent is risksensitive.
Risk-sensitive inverse reinforcement learning via semi- and non-parametric methods
- Computer ScienceInt. J. Robotics Res.
- 2018
Comparisons with a risk-neutral model show that the RS-IRL framework more accurately captures observed participant behavior both qualitatively and quantitatively, especially in scenarios where catastrophic outcomes such as collisions can occur.
Reinforcement Learning Beyond Expectation
- Computer Science2021 60th IEEE Conference on Decision and Control (CDC)
- 2021
Two algorithms to enable agents to learn policies to optimize the CPT-value are developed, and it is demonstrated that behaviors of the agent learned using these algorithms are better aligned with that of a human user who might be placed in the same environment, and is significantly improved over a baseline that optimizes an expected utility.
References
SHOWING 1-10 OF 66 REFERENCES
Cumulative Prospect Theory Meets Reinforcement Learning: Estimation and Control
- Computer ScienceArXiv
- 2015
Using an empirical distribution over the policy space in conjunction with Kullback-Leibler (KL) divergence to the reference distribution, this work gets a global policy optimization scheme and provides theor etical convergence guarantees for all the proposed algorithms.
Actor-Critic Algorithms for Risk-Sensitive MDPs
- Computer ScienceNIPS
- 2013
This paper considers both discounted and average reward Markov decision processes and devise actor-critic algorithms for estimating the gradient and updating the policy parameters in the ascent direction, which establish the convergence of the algorithms to locally risk-sensitive optimal policies.
Learning Algorithms for Risk-Sensitive Control
- Computer Science
- 2010
Two learning schemes, Q-learning and the actor-critic method, are described along with their convergence analysis, which give stochastic approximation versions of the traditional iterative schemes for solving dynamic programs.
Policy Gradients for CVaR-Constrained MDPs
- Computer ScienceALT
- 2014
Two algorithms are proposed that obtain a locally risk-optimal policy by employing four tools: stochastic approximation, mini batches, policy gradients and importance sampling, and an importance sampling based variance reduction scheme is incorporated into these algorithms.
Policy Gradients for CVaR-Constrained MDPs
- Computer Science
- 2014
Two algorithms are proposed that obtain a locally risk-optimal policy by employing four tools: stochastic approximation, mini batches, policy gradients and importance sampling, and an importance sampling based variance reduction scheme is incorporated into them.
Algorithms for CVaR Optimization in MDPs
- Computer ScienceNIPS
- 2014
This paper first derive a formula for computing the gradient of this risk-sensitive objective function, then devise policy gradient and actor-critic algorithms that each uses a specific method to estimate this gradient and updates the policy parameters in the descent direction.
Policy Gradients with Variance Related Risk Criteria
- Computer ScienceICML
- 2012
A framework for local policy gradient style algorithms for reinforcement learning for variance related criteria for policy gradient algorithms for criteria that involve both the expected cost and the variance of the cost.
Reinforcement Learning With Function Approximation for Traffic Signal Control
- Computer ScienceIEEE Transactions on Intelligent Transportation Systems
- 2011
A reinforcement learning (RL) algorithm with function approximation for traffic signal control that incorporates state-action features and is easily implementable in high-dimensional settings and outperforms all the other algorithms on all the road network settings that it considers.
A simultaneous perturbation stochastic approximation-based actor-critic algorithm for Markov decision processes
- MathematicsIEEE Transactions on Automatic Control
- 2004
A two-timescale simulation-based actor-critic algorithm for solution of infinite horizon Markov decision processes with finite state and compact action spaces under the discounted cost criterion is proposed and the proof of convergence to a locally optimal policy is presented.