# Finite-State Controllers of POMDPs using Parameter Synthesis

@article{Junges2018FiniteStateCO, title={Finite-State Controllers of POMDPs using Parameter Synthesis}, author={Sebastian Junges and N. Jansen and Ralf Wimmer and Tim Quatmann and Leonore Winterer and Joost-Pieter Katoen and Bernd Becker}, journal={ArXiv}, year={2018}, volume={abs/1710.10294} }

Uncertainty in Artificial Intelligence: Thirty-Fourth Conference (2018) August 6-10, 2018, Monterey, California, USA

## Figures and Tables from this paper

## 45 Citations

Finite-state Controllers of POMDPs via Parameter Synthesis*

- Computer Science
- 2019

This work studies finite-state controllers for partially observable Markov decision processes (POMDPs) that are provably correct with respect to given specifications and shows comparable performance to well-known POMDP solvers.

Certified Reinforcement Learning with Logic Guidance

- Mathematics, Computer ScienceArXiv
- 2019

This paper proposes the first model-free Reinforcement Learning (RL) framework to synthesise policies for unknown, and continuous-state Markov Decision Processes (MDPs), such that a given linear…

Point-Based Methods for Model Checking in Partially Observable Markov Decision Processes

- Computer Science, MathematicsAAAI
- 2020

This work shows how to use point-based value iteration methods to efficiently approximate the maximum probability of satisfying a desired logical formula and compute the associated belief state policy in a partially observable Markov decision process (POMDP).

Gradient-Descent for Randomized Controllers under Partial Observability

- Computer ScienceVMCAI
- 2022

This paper shows how to define and evaluate gradients of pMCs and investigates varieties of gradient descent techniques from the machine learning community to synthesize the probabilities in a pMC.

Human-inthe-Loop Synthesis for Partially Observable Markov Decision Processes

- Computer Science
- 2018

Experiments show that by including humans into the POMDP verification loop the authors improve the state of the art by orders of magnitude in terms of scalability.

Human-in-the-Loop Synthesis for Partially Observable Markov Decision Processes

- Computer Science2018 Annual American Control Conference (ACC)
- 2018

Experiments show that by including humans into the POMDP verification loop the authors improve the state of the art by orders of magnitude in terms of scalability.

Unpredictable Planning Under Partial Observability

- Computer Science2019 IEEE 58th Conference on Decision and Control (CDC)
- 2019

It is proved that a decision-maker with perfect observations can randomize its paths at least as well as a decision,maker with partial observations, and it is shown that the maximum entropy of a POMDP is lower bounded by themaximum entropy of this pMC.

Convex Optimization meets Parameter Synthesis for MDPs

- Computer Science
- 2019

Model checking is a well-studied technique that provides guarantees on appropriate behavior for all possible events and scenarios and can be applied to systems with stochastic uncertainties, including discrete-time Markov chains, Markov decision processes (MDPs), and their continuous-time counterparts.

Enforcing Almost-Sure Reachability in POMDPs

- Computer ScienceCAV
- 2021

This work presents an iterative symbolic approach that computes a winning region, that is, a set of system configurations such that all policies that stay within this set are guaranteed to satisfy the constraints.

Robust Policy Synthesis for Uncertain POMDPs via Convex Optimization

- Computer ScienceIJCAI
- 2020

The feasibility of the approach, which provides a transformation of the problem to a convex QCQP with finitely many constraints, is demonstrated by means of several case studies that highlight typical bottlenecks for the problem.

## References

SHOWING 1-10 OF 55 REFERENCES

POMDP solution methods

- Computer Science
- 2003

This is an overview of partially observable Markov decision processes (POMDPs). We describe POMDP value and policy iteration as well as gradient ascent algorithms. The emphasis is on solution methods…

Probabilistic robotics

- Computer ScienceCACM
- 2002

This research presents a novel approach to planning and navigation algorithms that exploit statistics gleaned from uncertain, imperfect real-world environments to guide robots toward their goals and around obstacles.

Safety-Constrained Reinforcement Learning for MDPs

- Computer ScienceTACAS
- 2016

This work abstracts controller synthesis for stochastic and partially unknown environments in which safety is essential as a Markov decision process in which the expected performance is measured using a cost function that is unknown prior to run-time exploration of the state space.

On the Computational Complexity of Stochastic Controller Optimization in POMDPs

- Computer ScienceTOCT
- 2012

The result establishes that the more general problem of stochastic controller optimization in POMDPs is also NP-hard, and outlines a special case that is convex and admits efficient global solutions.

An Optimal Best-First Search Algorithm for Solving Infinite Horizon DEC-POMDPs

- Computer Science, MathematicsECML
- 2005

This work develops the first complete and optimal algorithm that is able to extract deterministic policy vectors based on finite state controllers for a cooperative team of agents and extends best-first search methods to the domain of decentralized control theory.

Permissive Controller Synthesis for Probabilistic Systems

- MathematicsTACAS
- 2014

A permissive controller synthesis framework is developed, which generates multi-strategies for the controller, offering a choice of control actions to take at each time step, and formalises the notion of permissivity using penalties.

Control of probabilistic systems under dynamic, partially known environments with temporal logic specifications

- Computer Science2012 IEEE 51st IEEE Conference on Decision and Control (CDC)
- 2012

This work considers the synthesis of control policies for probabilistic systems, modeled by Markov decision processes, operating in partially known environments with temporal logic specifications, using Markov chains to describe the behavior of the environment in each mode.

Finding Approximate POMDP solutions Through Belief Compression

- Computer ScienceJ. Artif. Intell. Res.
- 2005

This thesis describes a scalable approach to POMDP planning which uses low-dimensional representations of the belief space and demonstrates how to make use of a variant of Principal Components Analysis (PCA) called Exponential family PCA in order to compress certain kinds of large real-world PomDPs, and find policies for these problems.

A Symbolic SAT-Based Algorithm for Almost-Sure Reachability with Small Strategies in POMDPs

- Mathematics, Computer ScienceAAAI
- 2016

This work first studies the existence of observation-stationary strategies, which is NP-complete, and then small-memory strategies, and presents a symbolic algorithm by an efficient encoding to SAT and using a SAT solver for the problem of almost-sure reachability.