• Corpus ID: 24002342

Finite-State Controllers of POMDPs using Parameter Synthesis

@article{Junges2018FiniteStateCO,
  title={Finite-State Controllers of POMDPs using Parameter Synthesis},
  author={Sebastian Junges and N. Jansen and Ralf Wimmer and Tim Quatmann and Leonore Winterer and Joost-Pieter Katoen and Bernd Becker},
  journal={ArXiv},
  year={2018},
  volume={abs/1710.10294}
}
Uncertainty in Artificial Intelligence: Thirty-Fourth Conference (2018) August 6-10, 2018, Monterey, California, USA 
Finite-state Controllers of POMDPs via Parameter Synthesis*
TLDR
This work studies finite-state controllers for partially observable Markov decision processes (POMDPs) that are provably correct with respect to given specifications and shows comparable performance to well-known POMDP solvers.
Certified Reinforcement Learning with Logic Guidance
This paper proposes the first model-free Reinforcement Learning (RL) framework to synthesise policies for unknown, and continuous-state Markov Decision Processes (MDPs), such that a given linear
Point-Based Methods for Model Checking in Partially Observable Markov Decision Processes
TLDR
This work shows how to use point-based value iteration methods to efficiently approximate the maximum probability of satisfying a desired logical formula and compute the associated belief state policy in a partially observable Markov decision process (POMDP).
Gradient-Descent for Randomized Controllers under Partial Observability
TLDR
This paper shows how to define and evaluate gradients of pMCs and investigates varieties of gradient descent techniques from the machine learning community to synthesize the probabilities in a pMC.
Human-inthe-Loop Synthesis for Partially Observable Markov Decision Processes
TLDR
Experiments show that by including humans into the POMDP verification loop the authors improve the state of the art by orders of magnitude in terms of scalability.
Human-in-the-Loop Synthesis for Partially Observable Markov Decision Processes
TLDR
Experiments show that by including humans into the POMDP verification loop the authors improve the state of the art by orders of magnitude in terms of scalability.
Unpredictable Planning Under Partial Observability
TLDR
It is proved that a decision-maker with perfect observations can randomize its paths at least as well as a decision,maker with partial observations, and it is shown that the maximum entropy of a POMDP is lower bounded by themaximum entropy of this pMC.
Convex Optimization meets Parameter Synthesis for MDPs
TLDR
Model checking is a well-studied technique that provides guarantees on appropriate behavior for all possible events and scenarios and can be applied to systems with stochastic uncertainties, including discrete-time Markov chains, Markov decision processes (MDPs), and their continuous-time counterparts.
Enforcing Almost-Sure Reachability in POMDPs
TLDR
This work presents an iterative symbolic approach that computes a winning region, that is, a set of system configurations such that all policies that stay within this set are guaranteed to satisfy the constraints.
Robust Policy Synthesis for Uncertain POMDPs via Convex Optimization
TLDR
The feasibility of the approach, which provides a transformation of the problem to a convex QCQP with finitely many constraints, is demonstrated by means of several case studies that highlight typical bottlenecks for the problem.
...
...

References

SHOWING 1-10 OF 55 REFERENCES
POMDP solution methods
This is an overview of partially observable Markov decision processes (POMDPs). We describe POMDP value and policy iteration as well as gradient ascent algorithms. The emphasis is on solution methods
Probabilistic robotics
TLDR
This research presents a novel approach to planning and navigation algorithms that exploit statistics gleaned from uncertain, imperfect real-world environments to guide robots toward their goals and around obstacles.
Safety-Constrained Reinforcement Learning for MDPs
TLDR
This work abstracts controller synthesis for stochastic and partially unknown environments in which safety is essential as a Markov decision process in which the expected performance is measured using a cost function that is unknown prior to run-time exploration of the state space.
Planning and Acting in Partially Observable Stochastic Domains
On the Computational Complexity of Stochastic Controller Optimization in POMDPs
TLDR
The result establishes that the more general problem of stochastic controller optimization in POMDPs is also NP-hard, and outlines a special case that is convex and admits efficient global solutions.
An Optimal Best-First Search Algorithm for Solving Infinite Horizon DEC-POMDPs
TLDR
This work develops the first complete and optimal algorithm that is able to extract deterministic policy vectors based on finite state controllers for a cooperative team of agents and extends best-first search methods to the domain of decentralized control theory.
Permissive Controller Synthesis for Probabilistic Systems
TLDR
A permissive controller synthesis framework is developed, which generates multi-strategies for the controller, offering a choice of control actions to take at each time step, and formalises the notion of permissivity using penalties.
Control of probabilistic systems under dynamic, partially known environments with temporal logic specifications
TLDR
This work considers the synthesis of control policies for probabilistic systems, modeled by Markov decision processes, operating in partially known environments with temporal logic specifications, using Markov chains to describe the behavior of the environment in each mode.
Finding Approximate POMDP solutions Through Belief Compression
TLDR
This thesis describes a scalable approach to POMDP planning which uses low-dimensional representations of the belief space and demonstrates how to make use of a variant of Principal Components Analysis (PCA) called Exponential family PCA in order to compress certain kinds of large real-world PomDPs, and find policies for these problems.
A Symbolic SAT-Based Algorithm for Almost-Sure Reachability with Small Strategies in POMDPs
TLDR
This work first studies the existence of observation-stationary strategies, which is NP-complete, and then small-memory strategies, and presents a symbolic algorithm by an efficient encoding to SAT and using a SAT solver for the problem of almost-sure reachability.
...
...