Verification of indefinite-horizon POMDPs

@inproceedings{Bork2020VerificationOI,
  title={Verification of indefinite-horizon POMDPs},
  author={Alexander Bork and Sebastian Junges and Joost-Pieter Katoen and Tim Quatmann},
  booktitle={ATVA},
  year={2020}
}
The verification problem in MDPs asks whether, for any policy resolving the nondeterminism, the probability that something bad happens is bounded by some given threshold. This verification problem is often overly pessimistic, as the policies it considers may depend on the complete system state. This paper considers the verification problem for partially observable MDPs, in which the policies make their decisions based on (the history of) the observations emitted by the system. We present an… 
Enforcing Almost-Sure Reachability in POMDPs
TLDR
This work presents an iterative symbolic approach that computes a winning region, that is, a set of system configurations such that all policies that stay within this set are guaranteed to satisfy the constraints.
Runtime Monitoring for Markov Decision Processes
TLDR
This work investigates the problem of monitoring partially observable systems with nondeterministic and probabilistic dynamics and presents a tractable algorithm based on model checking conditional reachability probabilities, which demonstrates the applicability of the algorithms to a range of benchmarks.
Runtime Monitors for Markov Decision Processes
TLDR
This work investigates the problem of monitoring partially observable systems with nondeterministic and probabilistic dynamics and presents a tractable algorithm based on model checking conditional reachability probabilities, which demonstrates the applicability of the algorithms to a range of benchmarks.
On the Verification of Belief Programs
TLDR
A formalism for belief programs based on a modal logic of actions and beliefs is proposed, which allows for PCTL-like temporal properties smoothly and investigates the decidability and undecidability for the verification problem of belief programs.
Under-Approximating Expected Total Rewards in POMDPs
TLDR
This work considers the problem: is the optimal expected total reward to reach a goal state in a partially observable Markov decision process (POMDP) below a given threshold and provides two techniques: a simple (cut-off) technique that uses a good policy on the POMDP, and a more advanced technique (belief clipping) that uses minimal shifts of probabilities between beliefs.
Reinforcement Learning under Partial Observability Guided by Learned Environment Models
TLDR
This work combines Q-learning with IoAlergia, a method for learning Markov decision processes (MDP), and provides RL with additional observations in the form of abstract environment states by simulating new experiences on learned environment models to track the explored states.
Inductive Synthesis of Finite-State Controllers for POMDPs
TLDR
A novel learning framework to obtain finite-state controllers (FSCs) for partially observable Markov decision processes and its applicability for indefinite-horizon specifications is presented.
Probabilistic Model Checking and Autonomy
TLDR
An overview of probabilistic model checking is provided, focusing on models supported by the PRISM and PRISM-games model checkers, including fully observable and partially observable Markov decision processes, as well as turn-based and concurrent stochastic games, together with associated Probabilistic temporal logics.
Gradient-Descent for Randomized Controllers under Partial Observability
TLDR
This paper shows how to define and evaluate gradients of pMCs and investigates varieties of gradient descent techniques from the machine learning community to synthesize the probabilities in a pMC.
The Probabilistic Model Checker Storm
TLDR
The main features of Storm are reported and how to effectively use them are explained and an empirical evaluation of different configurations of Storm on the QComp 2019 benchmark set is presented.

References

SHOWING 1-10 OF 41 REFERENCES
Point-Based Methods for Model Checking in Partially Observable Markov Decision Processes
TLDR
This work shows how to use point-based value iteration methods to efficiently approximate the maximum probability of satisfying a desired logical formula and compute the associated belief state policy in a partially observable Markov decision process (POMDP).
Verification of Markov Decision Processes Using Learning Algorithms
TLDR
A general framework for applying machine-learning algorithms to the verification of Markov decision processes (MDPs) and focuses on probabilistic reachability, which is a core property for verification, and is illustrated through two distinct instantiations.
Optimistic Value Iteration
TLDR
This paper obtains a lower bound via standard value iteration, uses the result to “guess” an upper bound, and proves the latter’s correctness, and presents this optimistic value iteration approach for computing reachability probabilities as well as expected rewards.
Solving POMDPs by Searching the Space of Finite Policies
TLDR
This paper explores the problem of finding the optimal policy from a restricted set of policies, represented as finite state automata of a given size, and demonstrates good empirical results with a branch-and-bound method for finding globally optimal deterministic policies, and a gradient-ascent method forFinding locally optimal stochastic policies.
Verification and control of partially observable probabilistic systems
TLDR
Probabilistic temporal logics are given that can express a range of quantitative properties of partially observable, probabilistic systems for both discrete and dense models of time, relating to the probability of an event’s occurrence or the expected value of a reward measure.
Bounded Model Checking for Probabilistic Programs
TLDR
This paper proposes an on–the–fly approach where the operational model is successively created and verified via a step–wise execution of the program, enabling to take key features of many probabilistic programs into account: nondeterminism and conditioning.
Motion planning under partial observability using game-based abstraction
TLDR
This work addresses motion planning problems where agents move inside environments that are not fully observable and subject to uncertainties by exploiting typical structural properties of such scenarios; for instance, it assumes that agents have the ability to observe their own positions inside an environment.
Planning and Acting in Partially Observable Stochastic Domains
Point-Based Value Iteration for Finite-Horizon POMDPs
TLDR
This paper presents a general point-based value iteration algorithm for finite-horizon POMDP problems which provides solutions with guarantees on solution quality and introduces two heuristics to reduce the number of belief points considered during execution, which lowers the computational requirements.
...
...