Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs

@inproceedings{Hork2018GoalHSVIHS,
  title={Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs},
  author={K. Hor{\'a}k and Branislav Bosansk{\'y} and Krishnendu Chatterjee},
  booktitle={IJCAI},
  year={2018}
}
Partially observable Markov decision processes (POMDPs) are the standard models for planning under uncertainty with both finite and infinite horizon. Besides the well-known discounted-sum objective, indefinite-horizon objective (aka Goal-POMDPs) is another classical objective for POMDPs. In this case, given a set of target states and a positive cost for each transition, the optimization objective is to minimize the expected total cost until a target state is reached. In the literature, RTDP… 

Figures and Tables from this paper

Under-Approximating Expected Total Rewards in POMDPs

TLDR
This work considers the problem: is the optimal expected total reward to reach a goal state in a partially observable Markov decision process (POMDP) below a given threshold and provides two techniques: a simple (cut-off) technique that uses a good policy on the POMDP, and a more advanced technique (belief clipping) that uses minimal shifts of probabilities between beliefs.

Solving Partially Observable Stochastic Shortest-Path Games

TLDR
A novel heuristic search value iteration algorithm that iteratively solves depth-limited variants of the game and derive the bound on the depth guaranteeing an arbitrary precision is introduced.

Enforcing Almost-Sure Reachability in POMDPs

TLDR
This work presents an iterative symbolic approach that computes a winning region, that is, a set of system configurations such that all policies that stay within this set are guaranteed to satisfy the constraints.

Solving Zero-Sum One-Sided Partially Observable Stochastic Games

TLDR
This work provides a theoretical analysis of one-sided POSGs and shows that a variant of a value-iteration algorithm converges in this setting and demonstrates the scalability of the algorithm in three different domains: pursuit-evasion, patrolling, and search games.

Verification of indefinite-horizon POMDPs

TLDR
This paper considers the verification problem for partially observable MDPs, and presents an abstraction-refinement framework extending previous instantiations of the Lovejoy-approach, showing that this framework significantly improves the scalability of the approach.

Gradient-Descent for Randomized Controllers under Partial Observability

TLDR
This paper shows how to define and evaluate gradients of pMCs and investigates varieties of gradient descent techniques from the machine learning community to synthesize the probabilities in a pMC.

Runtime Monitoring for Markov Decision Processes

TLDR
This work investigates the problem of monitoring partially observable systems with nondeterministic and probabilistic dynamics and presents a tractable algorithm based on model checking conditional reachability probabilities, which demonstrates the applicability of the algorithms to a range of benchmarks.

The Probabilistic Model Checker Storm

TLDR
The main features of Storm are reported and how to effectively use them are explained and an empirical evaluation of different configurations of Storm on the QComp 2019 benchmark set is presented.

Runtime Monitors for Markov Decision Processes

TLDR
This work investigates the problem of monitoring partially observable systems with nondeterministic and probabilistic dynamics and presents a tractable algorithm based on model checking conditional reachability probabilities, which demonstrates the applicability of the algorithms to a range of benchmarks.

References

SHOWING 1-10 OF 27 REFERENCES

Solving Large POMDPs using Real Time Dynamic Programming

TLDR
A new pomdp algorithm is introduced that combines the beneets of optimal and heuristic procedures producing good solutions quickly even in problems that are large, and experiments suggest that large pomdps are quickly and consistently solved, and that solutions, if not optimal, tend to be very good.

Value-Function Approximations for Partially Observable Markov Decision Processes

TLDR
This work surveys various approximation methods, analyzes their properties and relations and provides some new insights into their differences, and presents a number of new approximation methods and novel refinements of existing techniques.

Heuristic Search for Generalized Stochastic Shortest Path MDPs

TLDR
A new heuristic-search-based family of algorithms, FRET (Find, Revise, Eliminate Traps), is presented and a preliminary empirical evaluation shows that FRET solves GSSPs much more efficiently than Value Iteration.

SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces

TLDR
This work has developed a new point-based POMDP algorithm that exploits the notion of optimally reachable belief spaces to improve com- putational efficiency and substantially outperformed one of the fastest existing point- based algorithms.

Probabilistic planning for robotic exploration

TLDR
Planning algorithms that generate robot control policies for partially observable Markov decision process (POMDP) planning problems and the relevance of onboard science data analysis and POMDP planning to robotic exploration are demonstrated.

The Complexity of Markov Decision Processes

TLDR
All three variants of the classical problem of optimal policy computation in Markov decision processes, finite horizon, infinite horizon discounted, and infinite horizon average cost are shown to be complete for P, and therefore most likely cannot be solved by highly parallel algorithms.

On partially observed stochastic shortest path problems

  • S. Patek
  • Mathematics
    Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228)
  • 2001
We analyze a class of partially observed stochastic shortest path problems. These are terminating Markov decision process with imperfect state information that evolve on an infinite time horizon and

Exploiting Fully Observable and Deterministic Structures in Goal POMDPs

TLDR
Theoretical results show how a POMDP can be analyzed to identify the exploitable properties and formal guarantees are provided showing that the use of macro actions preserves solvability.