# Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes

@article{Zhang2011SpeedingUT, title={Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes}, author={Nevin Lianwen Zhang and Weihong Zhang}, journal={ArXiv}, year={2011}, volume={abs/1106.0251} }

Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a well-known algorithm for finding optimal policies for POMDPs. It typically takes a large number of iterations to converge. This paper proposes a method for accelerating the convergence of value iteration. The method has been evaluated on an array of benchmark problems and was found to be very…

## 145 Citations

### Algorithms for partially observable markov decision processes

- Computer Science
- 2001

Two ways to accelerate value iteration are investigated, which aim to reduce the number of DP updates and therefore value iteration over a belief subspace, a subset of belief space, and which is more efficient for this POMDP class.

### A ( Revised ) Survey of Approximate Methods for Solving Partially Observable Markov Decision Processes

- Computer Science
- 2003

Two versions of the POMDP training problem are explored: learning when a model of thePOMDP is known, and the much harder problem of learningWhen a model is not available.

### Solving Informative Partially Observable Markov Decision Processes

- Mathematics
- 2014

Solving Partially Observable Markov Decision Processes (POMDPs) generally is computationally intractable. In this paper, we study a special POMDP class, namely informative POMDPs, where each…

### Exploiting structure to efficiently solve large scale partially observable Markov decision processes

- Computer Science
- 2005

This thesis first presents a Bounded Policy Iteration algorithm to robustly find a good policy represented by a small finite state controller, and describes three approaches that combine techniques capable of dealing with each source of intractability: VDC with BPI, VDCWith Perseus, and state abstraction with Perseus.

### Space-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs

- MathematicsECSQARU
- 2001

It is argued that given sufficient time SPVI can find near-optimal policies for almost discernible POMDPs and proposed an anytime algorithm called space-progressive value iteration(SPVI).

### Online Planning Algorithms for POMDPs

- Computer ScienceJ. Artif. Intell. Res.
- 2008

The objectives here are to survey the various existing online POMDP methods, analyze their properties and discuss their advantages and disadvantages; and to thoroughly evaluate these online approaches in different environments under various metrics.

### On Anderson Acceleration for Partially Observable Markov Decision Processes

- Computer Science2021 60th IEEE Conference on Decision and Control (CDC)
- 2021

This paper proposes an accelerated method for approximately solving partially observable Markov decision process (POMDP) problems offline. Our method carefully combines two existing tools: Anderson…

### Structural Results for POMDPs

- Economics, Computer Science
- 2011

The optimal policy structures for the general POMDP model are discussed and examples of such structures in the context of maintenance optimization are provided.

### Reduction of a POMDP to an MDP

- Mathematics
- 2010

A partially observable Markov decision process (POMDP) is an appropriate mathematical modeling tool for dynamic stochastic systems where portions or all of the system states are not completely…

### Bounded-parameter Partially Observable Markov Decision Processes: Framework and Algorithm

- Computer ScienceInt. J. Uncertain. Fuzziness Knowl. Based Syst.
- 2013

A modified value iteration is proposed as a basic strategy for tackling parameter imprecision in BPOMDPs and the UL-based value iteration algorithm, in which each value backup is based on two sets of vectors called U-set and L-set is designed.

## References

SHOWING 1-10 OF 55 REFERENCES

### A Method for Speeding Up Value Iteration in Partially Observable Markov Decision Processes

- Computer ScienceUAI
- 1999

A technique for speeding up the convergence of value iteration for partially observable Markov decisions processes (POMDPs) that can be easily incorporated into any existing POMDP value iteration algorithms.

### Algorithms for partially observable markov decision processes

- Computer Science
- 2001

Two ways to accelerate value iteration are investigated, which aim to reduce the number of DP updates and therefore value iteration over a belief subspace, a subset of belief space, and which is more efficient for this POMDP class.

### Solution Procedures for Partially Observed Markov Decision Processes

- MathematicsOper. Res.
- 1989

Three algorithms to solve the infinite horizon, expected discounted total reward partially observed Markov decision process POMDP, with an appropriately generalized numerical technique that has been shown to reduce CPU time until convergence for the completely observed case.

### Value-Function Approximations for Partially Observable Markov Decision Processes

- Computer ScienceJ. Artif. Intell. Res.
- 2000

This work surveys various approximation methods, analyzes their properties and relations and provides some new insights into their differences, and presents a number of new approximation methods and novel refinements of existing techniques.

### Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes

- Computer ScienceUAI
- 1997

It is found that incremental pruning is presently the most efficient exact method for solving POMDPS.

### Efficient dynamic-programming updates in partially observable Markov decision processes

- Computer Science
- 1995

A new algorithm is offered, called the witness algorithm, which can compute updated value functions efficiently on a restricted class of POMDPs in which the number of linear facets is not too great and it is found that it is the fastest algorithm over a wide range of PomDP sizes.

### A Heuristic Variable Grid Solution Method for POMDPs

- Computer ScienceAAAI/IAAI
- 1997

A simple variable-grid solution method which yields good results on relatively large problems with modest computational effort is described.

### Planning and Acting in Partially Observable Stochastic Domains

- MathematicsArtif. Intell.
- 1998

### The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs

- MathematicsOper. Res.
- 1978

The paper develops easily implemented approximations to stationary policies based on finitely transient policies and shows that the concave hull of an approximation can be included in the well-known Howard policy improvement algorithm with subsequent convergence.

### Solving POMDPs by Searching in Policy Space

- Computer ScienceUAI
- 1998

An approach to solving POMDPs is presented that represents a policy explicitly as a finite-state controller and iteratively improves the controller by search in policy space, which provides the foundation for a new heuristic search algorithm.