# Simple Strategies in Multi-Objective MDPs

@article{Delgrange2020SimpleSI, title={Simple Strategies in Multi-Objective MDPs}, author={Florent Delgrange and Joost-Pieter Katoen and Tim Quatmann and Mickael Randour}, journal={Tools and Algorithms for the Construction and Analysis of Systems}, year={2020}, volume={12078}, pages={346 - 364} }

We consider the verification of multiple expected reward objectives at once on Markov decision processes (MDPs). This enables a trade-off analysis among multiple objectives by obtaining a Pareto front. We focus on strategies that are easy to employ and implement. That is, strategies that are pure (no randomization) and have bounded memory. We show that checking whether a point is achievable by a pure stationary strategy is NP-complete, even for two objectives, and we provide an MILP encoding to…

## 13 Citations

Simple Strategies in Multi-Objective MDPs (Technical Report)

- Computer ScienceArXiv
- 2019

It is shown that checking whether a point is achievable by a pure stationary strategy is NP-complete, even for two objectives, and the authors provide an MILP encoding to solve the corresponding problem.

On Minimizing Total Discounted Cost in MDPs Subject to Reachability Constraints

- Mathematics
- 2021

We study the synthesis of a policy in a Markov decision process (MDP) following which an agent reaches a target state in the MDP while minimizing its total discounted cost. The problem combines a…

Stochastic Games with Disjunctions of Multiple Objectives (Technical Report)

- Computer ScienceGandALF
- 2021

A fine-grained overview of strategy and computational complexity is presented and a novel value iteration-style algorithm for approximating the set of Pareto optimal thresholds for a given DQ is proposed.

Evolutionary-Guided Synthesis of Verified Pareto-Optimal MDP Policies

- Computer Science2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE)
- 2021

This work uses case studies from the service-based systems and robotic control software domains to show that the new MDP policy synthesis approach can handle a wide range of QoS requirement combinations unsupported by current probabilistic model checkers.

Arena-Independent Finite-Memory Determinacy in Stochastic Games

- Computer ScienceCONCUR
- 2021

These contributions further the understanding of arena-independent finite-memory (AIFM) determinacy, i.e., the study of objectives for which memory is needed, but in a way that only depends on limited parameters of the game graphs.

Multi-Objective Controller Synthesis with Uncertain Human Preferences

- Computer Science2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS)
- 2022

This work formalizes the notion of uncertain human preferences, and presents a novel approach that accounts for this uncertainty in the context of multi-objective controller synthesis for Markov decision processes (MDPs).

Stochastic Games with Lexicographic Reachability-Safety Objectives

- Computer ScienceCAV
- 2020

An algorithm is presented that computes lexicographically optimal strategies via a reduction to computation of optimal strategies in a sequence of single-objectives games.

Different strokes in randomised strategies: Revisiting Kuhn's theorem under finite-memory assumptions

- Computer ScienceArXiv
- 2022

This work studies two-player turn-based stochastic games and provides a complete taxonomy of the classes of ﬁnite-memory strategies obtained by varying which of the three aforementioned components are randomised.

Games Where You Can Play Optimally with Arena-Independent Finite Memory

- Computer ScienceCONCUR
- 2020

A complete characterization of preference relations that admit optimal strategies using arena-independent finite memory is established, generalizing the work of Gimbert and Zielonka to the finite-memory case and proving an equivalent to their celebrated corollary.

Entropy-Guided Control Improvisation

- Computer ScienceRobotics: Science and Systems
- 2021

This framework, which extends the state-of-the-art by supporting arbitrary combinations of adversarial and probabilistic uncertainty in the environment, enables a flexible modeling formalism which it is argued, theoretically and empirically, remains tractable.

## References

SHOWING 1-10 OF 51 REFERENCES

Simple Strategies in Multi-Objective MDPs (Technical Report)

- Computer ScienceArXiv
- 2019

It is shown that checking whether a point is achievable by a pure stationary strategy is NP-complete, even for two objectives, and the authors provide an MILP encoding to solve the corresponding problem.

Markov Decision Processes with Multiple Objectives

- Computer ScienceSTACS
- 2006

It is shown that every Pareto-optimal point can be achieved by a memoryless strategy; however, unlike in the single-objective case, the memoryless strategies may require randomization.

Unifying Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes

- Computer Science2015 30th Annual ACM/IEEE Symposium on Logic in Computer Science
- 2015

This work considers Markov decision processes with multiple limit-average objectives with multiple mean-payoff objectives, and presents a complete characterization of the strategy complexity (in terms of memory bounds and randomization) required to solve the problem.

On Finding Compromise Solutions in Multiobjective Markov Decision Processes

- Computer ScienceECAI
- 2010

This work uses an alternative optimality concept which formalizes the notion of best compromise solution, i.e. a policy yielding an expected-utility vector as close as possible to a reference point to show that this notion of optimality depends on the initial state.

Threshold Constraints with Guarantees for Parity Objectives in Markov Decision Processes

- Computer ScienceICALP
- 2017

This work extends the framework of [BFRR14] and follow-up papers, by addressing the case of $\omega$-regular conditions encoded as parity objectives, a natural way to represent functional requirements of systems by establishing that, for all variants of this problem, deciding the existence of a strategy lies in ${\sf NP} \cap {\sf coNP}$.

Multi-Objective Model Checking of Markov Decision Processes

- Computer Science, MathematicsLog. Methods Comput. Sci.
- 2007

It is shown that one can compute an approximate Pareto curve with respect to a set of ω-regular properties in time polynomial in the size of the MDP.

Multi-weighted Markov Decision Processes with Reachability Objectives

- MathematicsGandALF
- 2018

In this paper, we are interested in the synthesis of schedulers in double-weighted Markov decision processes, which satisfy both a percentile constraint over a weighted reachability condition, and a…

Computing Optimal Stationary Policies for Multi-Objective Markov Decision Processes

- Computer Science2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning
- 2007

It is proved that the CON-MODP algorithm converges to the Pareto optimal set of value functions and policies for deterministic infinite horizon discounted multi-objective Markov decision processes.

A Survey of Multi-Objective Sequential Decision-Making

- Computer ScienceJ. Artif. Intell. Res.
- 2013

This article surveys algorithms designed for sequential decision-making problems with multiple objectives and proposes a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function, and the type of policies considered.

Markov Automata with Multiple Objectives

- Computer ScienceCAV
- 2017

Algorithms to analyze several objectives simultaneously and approximate Pareto curves are presented, including, e.g., several (timed) reachability objectives, or various expected cost objectives.