Multi-objective Optimization of Long-run Average and Total Rewards

  title={Multi-objective Optimization of Long-run Average and Total Rewards},
  author={Tim Quatmann and Joost-Pieter Katoen},
  journal={Tools and Algorithms for the Construction and Analysis of Systems},
  pages={230 - 249}
  • Tim Quatmann, J. Katoen
  • Published 26 October 2020
  • Computer Science
  • Tools and Algorithms for the Construction and Analysis of Systems
This paper presents an efficient procedure for multi-objective model checking of long-run average reward (aka: mean pay-off) and total reward objectives as well as their combination. We consider this for Markov automata, a compositional model that captures both traditional Markov decision processes (MDPs) as well as a continuous-time variant thereof. The crux of our procedure is a generalization of Forejt et al.’s approach for total rewards on MDPs to arbitrary combinations of long-run and… 


Value Iteration for Long-Run Average Reward in Markov Decision Processes
It is shown that a combination of applying VI locally for each maximal end-component (MEC) and VI for reachability objectives can provide approximation guarantees, and an anytime algorithm is presented that is able to deal with very large models.
Strategy Synthesis for Stochastic Games with Multiple Long-Run Objectives
This work shows that strategies constructed from Pareto set approximations of expected energy objectives are e-optimal for the corresponding average rewards in turn-based stochastic games whose winning conditions are conjunctions of satisfaction objectives for long-run average rewards.
Markov Automata with Multiple Objectives
Algorithms to analyze several objectives simultaneously and approximate Pareto curves are presented, including, e.g., several (timed) reachability objectives, or various expected cost objectives.
Multi-cost Bounded Tradeoff Analysis in MDP
The need for more detailed visual presentations of results beyond Pareto curves is discussed and a first visualisation approach that exploits all the available information from the algorithm to support decision makers is presented.
Modelling and Analysis of Markov Reward Automata
This paper introduces Markov reward automata, an extension of Markov automata that allows the modelling of systems incorporating rewards (or costs) in addition to nondeterminism, discrete probabilistic choice and continuous stochastic timing.
Markov Decision Processes with Multiple Long-Run Average Objectives
Algorithms for design exploration in MDP models with multiple long-run average objectives, including the problem if a given value vector is realizable by any strategy, are studied, and it is shown that it can be decided in polynomial time for irreducible M DPs and in NP for all MDPs.
Long-Run Rewards for Markov Automata
The computation of long-run average rewards, the most classical problem in continuous-time Markov model analysis, is considered and an algorithm based on value iteration is proposed, which improves the state of the art by orders of magnitude.
Optimistic Value Iteration
This paper obtains a lower bound via standard value iteration, uses the result to “guess” an upper bound, and proves the latter’s correctness, and presents this optimistic value iteration approach for computing reachability probabilities as well as expected rewards.
Markov Decision Processes with Multiple Objectives
It is shown that every Pareto-optimal point can be achieved by a memoryless strategy; however, unlike in the single-objective case, the memoryless strategies may require randomization.
Efficient Strategy Iteration for Mean Payoff in Markov Decision Processes
This work provides several techniques that speed up strategy iteration by orders of magnitude for many MDPs, eliminating the performance disadvantage while preserving all its advantages.