The variance-penalized stochastic shortest path problem

  title={The variance-penalized stochastic shortest path problem},
  author={Jakob Piribauer and Ocan Sankur and Christel Baier},
The stochastic shortest path problem (SSPP) asks to resolve the non-deterministic choices in a Markov decision process (MDP) such that the expected accumulated weight before reaching a target state is maximized. This paper addresses the optimization of the variance-penalized expectation (VPE) of the accumulated weight, which is a variant of the SSPP in which a multiple of the variance of accumulated weights is incurred as a penalty. It is shown that the optimal VPE in MDPs with non-negative… 

Figures from this paper



Partial and Conditional Expectations in Markov Decision Processes with Integer Weights

Two variants of the stochastic shortest path problem in Markov decision processes (MDPs) with integer weights are addressed, which constitute useful approaches to the analysis of systems without guarantees on the occurrence of an event of interest.

Stochastic Shortest Paths and Weight-Bounded Properties in Markov Decision Processes

It is shown that the problem to decide whether a disjunction of weight-bounded reachability conditions holds almost surely under some scheduler belongs to NP ∩ coNP, is solvable in pseudo-polynomial time and is at least as hard as solving two-player mean-payoff games.

Variance minimization of parameterized Markov decision processes

  • L. Xia
  • Mathematics
    Discret. Event Dyn. Syst.
  • 2018
This paper studies the variance minimization problem of Markov decision processes (MDPs) in which the policy is parameterized by action selection probabilities or other general parameters and derives a difference formula of the reward variance under any two parametric policies.

Risk-Sensitive Markov Decision Processes with Combined Metrics of Mean and Variance

  • L. Xia
  • Computer Science
  • 2020
This paper investigates the optimization problem of an infinite stage discrete time Markov decision process (MDP) with a long-run average metric considering both mean and variance of rewards together, and develops an iterative algorithm with a form of policy iteration, which is proved to converge to local optima both in the mixed and randomized policy space.

Maximizing the Conditional Expected Reward for Reaching the Goal

The paper addresses the problem of computing maximal conditional expected accumulated rewards until reaching a target state briefly called maximal conditional expectations in finite-state Markov

Computing Minimum and Maximum Reachability Times in Probabilistic Systems

The methods are based on algorithms to transform the original problems into problems that satisfy the required assumptions and lead to the efficient solution of two basic problems in the analysis of the reliability and performance of partially-specified systems.

On Skolem-hardness and saturation points in Markov decision processes

The inherent mathematical difficulty of a series of optimization problems on Markov decision processes (MDPs) is shown by a reduction from the Positivity problem to the associated decision problems which establishes that the problems are also at least as hard as the Skolem problem as an immediate consequence.

The Odds of Staying on Budget

This work studies the computational complexity of deciding whether the probability of paths whose accumulated cost satisfies a Boolean combination of inequalities exceeds a given threshold, and shows that this problem is PP-complete, whereas it is hard for the PosSLP problem and in PSpace for general Markov chains.