Simple Strategies in Multi-Objective MDPs

@article{Delgrange2020SimpleSI,
  title={Simple Strategies in Multi-Objective MDPs},
  author={Florent Delgrange and Joost-Pieter Katoen and Tim Quatmann and Mickael Randour},
  journal={Tools and Algorithms for the Construction and Analysis of Systems},
  year={2020},
  volume={12078},
  pages={346 - 364}
}
We consider the verification of multiple expected reward objectives at once on Markov decision processes (MDPs). This enables a trade-off analysis among multiple objectives by obtaining a Pareto front. We focus on strategies that are easy to employ and implement. That is, strategies that are pure (no randomization) and have bounded memory. We show that checking whether a point is achievable by a pure stationary strategy is NP-complete, even for two objectives, and we provide an MILP encoding to… 
Simple Strategies in Multi-Objective MDPs (Technical Report)
TLDR
It is shown that checking whether a point is achievable by a pure stationary strategy is NP-complete, even for two objectives, and the authors provide an MILP encoding to solve the corresponding problem.
On Minimizing Total Discounted Cost in MDPs Subject to Reachability Constraints
We study the synthesis of a policy in a Markov decision process (MDP) following which an agent reaches a target state in the MDP while minimizing its total discounted cost. The problem combines a
Stochastic Games with Disjunctions of Multiple Objectives (Technical Report)
TLDR
A fine-grained overview of strategy and computational complexity is presented and a novel value iteration-style algorithm for approximating the set of Pareto optimal thresholds for a given DQ is proposed.
Evolutionary-Guided Synthesis of Verified Pareto-Optimal MDP Policies
TLDR
This work uses case studies from the service-based systems and robotic control software domains to show that the new MDP policy synthesis approach can handle a wide range of QoS requirement combinations unsupported by current probabilistic model checkers.
Arena-Independent Finite-Memory Determinacy in Stochastic Games
TLDR
These contributions further the understanding of arena-independent finite-memory (AIFM) determinacy, i.e., the study of objectives for which memory is needed, but in a way that only depends on limited parameters of the game graphs.
Multi-Objective Controller Synthesis with Uncertain Human Preferences
TLDR
This work formalizes the notion of uncertain human preferences, and presents a novel approach that accounts for this uncertainty in the context of multi-objective controller synthesis for Markov decision processes (MDPs).
Stochastic Games with Lexicographic Reachability-Safety Objectives
TLDR
An algorithm is presented that computes lexicographically optimal strategies via a reduction to computation of optimal strategies in a sequence of single-objectives games.
Different strokes in randomised strategies: Revisiting Kuhn's theorem under finite-memory assumptions
TLDR
This work studies two-player turn-based stochastic games and provides a complete taxonomy of the classes of finite-memory strategies obtained by varying which of the three aforementioned components are randomised.
Games Where You Can Play Optimally with Arena-Independent Finite Memory
TLDR
A complete characterization of preference relations that admit optimal strategies using arena-independent finite memory is established, generalizing the work of Gimbert and Zielonka to the finite-memory case and proving an equivalent to their celebrated corollary.
Entropy-Guided Control Improvisation
TLDR
This framework, which extends the state-of-the-art by supporting arbitrary combinations of adversarial and probabilistic uncertainty in the environment, enables a flexible modeling formalism which it is argued, theoretically and empirically, remains tractable.
...
...

References

SHOWING 1-10 OF 51 REFERENCES
Simple Strategies in Multi-Objective MDPs (Technical Report)
TLDR
It is shown that checking whether a point is achievable by a pure stationary strategy is NP-complete, even for two objectives, and the authors provide an MILP encoding to solve the corresponding problem.
Markov Decision Processes with Multiple Objectives
TLDR
It is shown that every Pareto-optimal point can be achieved by a memoryless strategy; however, unlike in the single-objective case, the memoryless strategies may require randomization.
Unifying Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes
TLDR
This work considers Markov decision processes with multiple limit-average objectives with multiple mean-payoff objectives, and presents a complete characterization of the strategy complexity (in terms of memory bounds and randomization) required to solve the problem.
On Finding Compromise Solutions in Multiobjective Markov Decision Processes
TLDR
This work uses an alternative optimality concept which formalizes the notion of best compromise solution, i.e. a policy yielding an expected-utility vector as close as possible to a reference point to show that this notion of optimality depends on the initial state.
Threshold Constraints with Guarantees for Parity Objectives in Markov Decision Processes
TLDR
This work extends the framework of [BFRR14] and follow-up papers, by addressing the case of $\omega$-regular conditions encoded as parity objectives, a natural way to represent functional requirements of systems by establishing that, for all variants of this problem, deciding the existence of a strategy lies in ${\sf NP} \cap {\sf coNP}$.
Multi-Objective Model Checking of Markov Decision Processes
TLDR
It is shown that one can compute an approximate Pareto curve with respect to a set of ω-regular properties in time polynomial in the size of the MDP.
Multi-weighted Markov Decision Processes with Reachability Objectives
In this paper, we are interested in the synthesis of schedulers in double-weighted Markov decision processes, which satisfy both a percentile constraint over a weighted reachability condition, and a
Computing Optimal Stationary Policies for Multi-Objective Markov Decision Processes
  • M. WieringE. de Jong
  • Computer Science
    2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning
  • 2007
TLDR
It is proved that the CON-MODP algorithm converges to the Pareto optimal set of value functions and policies for deterministic infinite horizon discounted multi-objective Markov decision processes.
A Survey of Multi-Objective Sequential Decision-Making
TLDR
This article surveys algorithms designed for sequential decision-making problems with multiple objectives and proposes a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function, and the type of policies considered.
Markov Automata with Multiple Objectives
TLDR
Algorithms to analyze several objectives simultaneously and approximate Pareto curves are presented, including, e.g., several (timed) reachability objectives, or various expected cost objectives.
...
...