# Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs

@inproceedings{Hork2018GoalHSVIHS, title={Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs}, author={K. Hor{\'a}k and Branislav Bosansk{\'y} and Krishnendu Chatterjee}, booktitle={IJCAI}, year={2018} }

Partially observable Markov decision processes (POMDPs) are the standard models for planning under uncertainty with both finite and infinite horizon. Besides the well-known discounted-sum objective, indefinite-horizon objective (aka Goal-POMDPs) is another classical objective for POMDPs. In this case, given a set of target states and a positive cost for each transition, the optimization objective is to minimize the expected total cost until a target state is reached.
In the literature, RTDP…

## 9 Citations

### Under-Approximating Expected Total Rewards in POMDPs

- Computer ScienceTACAS
- 2022

This work considers the problem: is the optimal expected total reward to reach a goal state in a partially observable Markov decision process (POMDP) below a given threshold and provides two techniques: a simple (cut-off) technique that uses a good policy on the POMDP, and a more advanced technique (belief clipping) that uses minimal shifts of probabilities between beliefs.

### Solving Partially Observable Stochastic Shortest-Path Games

- Computer ScienceIJCAI
- 2021

A novel heuristic search value iteration algorithm that iteratively solves depth-limited variants of the game and derive the bound on the depth guaranteeing an arbitrary precision is introduced.

### Enforcing Almost-Sure Reachability in POMDPs

- Computer ScienceCAV
- 2021

This work presents an iterative symbolic approach that computes a winning region, that is, a set of system configurations such that all policies that stay within this set are guaranteed to satisfy the constraints.

### Solving Zero-Sum One-Sided Partially Observable Stochastic Games

- Computer ScienceArXiv
- 2020

This work provides a theoretical analysis of one-sided POSGs and shows that a variant of a value-iteration algorithm converges in this setting and demonstrates the scalability of the algorithm in three different domains: pursuit-evasion, patrolling, and search games.

### Verification of indefinite-horizon POMDPs

- Computer ScienceATVA
- 2020

This paper considers the verification problem for partially observable MDPs, and presents an abstraction-refinement framework extending previous instantiations of the Lovejoy-approach, showing that this framework significantly improves the scalability of the approach.

### Gradient-Descent for Randomized Controllers under Partial Observability

- Computer ScienceVMCAI
- 2022

This paper shows how to define and evaluate gradients of pMCs and investigates varieties of gradient descent techniques from the machine learning community to synthesize the probabilities in a pMC.

### Runtime Monitoring for Markov Decision Processes

- Computer ScienceArXiv
- 2021

This work investigates the problem of monitoring partially observable systems with nondeterministic and probabilistic dynamics and presents a tractable algorithm based on model checking conditional reachability probabilities, which demonstrates the applicability of the algorithms to a range of benchmarks.

### The Probabilistic Model Checker Storm

- Computer ScienceInt. J. Softw. Tools Technol. Transf.
- 2022

The main features of Storm are reported and how to effectively use them are explained and an empirical evaluation of different configurations of Storm on the QComp 2019 benchmark set is presented.

### Runtime Monitors for Markov Decision Processes

- Computer ScienceCAV
- 2021

This work investigates the problem of monitoring partially observable systems with nondeterministic and probabilistic dynamics and presents a tractable algorithm based on model checking conditional reachability probabilities, which demonstrates the applicability of the algorithms to a range of benchmarks.

## References

SHOWING 1-10 OF 27 REFERENCES

### Solving Large POMDPs using Real Time Dynamic Programming

- Computer Science
- 1998

A new pomdp algorithm is introduced that combines the beneets of optimal and heuristic procedures producing good solutions quickly even in problems that are large, and experiments suggest that large pomdps are quickly and consistently solved, and that solutions, if not optimal, tend to be very good.

### Optimal cost almost-sure reachability in POMDPs

- Mathematics, Computer ScienceArtif. Intell.
- 2016

### Value-Function Approximations for Partially Observable Markov Decision Processes

- Computer ScienceJ. Artif. Intell. Res.
- 2000

This work surveys various approximation methods, analyzes their properties and relations and provides some new insights into their differences, and presents a number of new approximation methods and novel refinements of existing techniques.

### Heuristic Search for Generalized Stochastic Shortest Path MDPs

- Computer ScienceICAPS
- 2011

A new heuristic-search-based family of algorithms, FRET (Find, Revise, Eliminate Traps), is presented and a preliminary empirical evaluation shows that FRET solves GSSPs much more efficiently than Value Iteration.

### SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces

- Computer ScienceRobotics: Science and Systems
- 2008

This work has developed a new point-based POMDP algorithm that exploits the notion of optimally reachable belief spaces to improve com- putational efficiency and substantially outperformed one of the fastest existing point- based algorithms.

### Learning Policies for Partially Observable Environments: Scaling Up

- Computer ScienceICML
- 1995

### Probabilistic planning for robotic exploration

- Computer Science
- 2007

Planning algorithms that generate robot control policies for partially observable Markov decision process (POMDP) planning problems and the relevance of onboard science data analysis and POMDP planning to robotic exploration are demonstrated.

### The Complexity of Markov Decision Processes

- Computer ScienceMath. Oper. Res.
- 1987

All three variants of the classical problem of optimal policy computation in Markov decision processes, finite horizon, infinite horizon discounted, and infinite horizon average cost are shown to be complete for P, and therefore most likely cannot be solved by highly parallel algorithms.

### On partially observed stochastic shortest path problems

- MathematicsProceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228)
- 2001

We analyze a class of partially observed stochastic shortest path problems. These are terminating Markov decision process with imperfect state information that evolve on an infinite time horizon and…

### Exploiting Fully Observable and Deterministic Structures in Goal POMDPs

- Computer ScienceICAPS
- 2013

Theoretical results show how a POMDP can be analyzed to identify the exploitable properties and formal guarantees are provided showing that the use of macro actions preserves solvability.