Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes

@article{Zhang2011SpeedingUT,
  title={Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes},
  author={Nevin Lianwen Zhang and Weihong Zhang},
  journal={ArXiv},
  year={2011},
  volume={abs/1106.0251}
}
Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a well-known algorithm for finding optimal policies for POMDPs. It typically takes a large number of iterations to converge. This paper proposes a method for accelerating the convergence of value iteration. The method has been evaluated on an array of benchmark problems and was found to be very… 

Algorithms for partially observable markov decision processes

Two ways to accelerate value iteration are investigated, which aim to reduce the number of DP updates and therefore value iteration over a belief subspace, a subset of belief space, and which is more efficient for this POMDP class.

A ( Revised ) Survey of Approximate Methods for Solving Partially Observable Markov Decision Processes

Two versions of the POMDP training problem are explored: learning when a model of thePOMDP is known, and the much harder problem of learningWhen a model is not available.

Solving Informative Partially Observable Markov Decision Processes

Solving Partially Observable Markov Decision Processes (POMDPs) generally is computationally intractable. In this paper, we study a special POMDP class, namely informative POMDPs, where each

Exploiting structure to efficiently solve large scale partially observable Markov decision processes

This thesis first presents a Bounded Policy Iteration algorithm to robustly find a good policy represented by a small finite state controller, and describes three approaches that combine techniques capable of dealing with each source of intractability: VDC with BPI, VDCWith Perseus, and state abstraction with Perseus.

Space-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs

It is argued that given sufficient time SPVI can find near-optimal policies for almost discernible POMDPs and proposed an anytime algorithm called space-progressive value iteration(SPVI).

Online Planning Algorithms for POMDPs

The objectives here are to survey the various existing online POMDP methods, analyze their properties and discuss their advantages and disadvantages; and to thoroughly evaluate these online approaches in different environments under various metrics.

On Anderson Acceleration for Partially Observable Markov Decision Processes

This paper proposes an accelerated method for approximately solving partially observable Markov decision process (POMDP) problems offline. Our method carefully combines two existing tools: Anderson

Structural Results for POMDPs

The optimal policy structures for the general POMDP model are discussed and examples of such structures in the context of maintenance optimization are provided.

Reduction of a POMDP to an MDP

A partially observable Markov decision process (POMDP) is an appropriate mathematical modeling tool for dynamic stochastic systems where portions or all of the system states are not completely

Bounded-parameter Partially Observable Markov Decision Processes: Framework and Algorithm

A modified value iteration is proposed as a basic strategy for tackling parameter imprecision in BPOMDPs and the UL-based value iteration algorithm, in which each value backup is based on two sets of vectors called U-set and L-set is designed.
...

References

SHOWING 1-10 OF 55 REFERENCES

A Method for Speeding Up Value Iteration in Partially Observable Markov Decision Processes

A technique for speeding up the convergence of value iteration for partially observable Markov decisions processes (POMDPs) that can be easily incorporated into any existing POMDP value iteration algorithms.

Algorithms for partially observable markov decision processes

Two ways to accelerate value iteration are investigated, which aim to reduce the number of DP updates and therefore value iteration over a belief subspace, a subset of belief space, and which is more efficient for this POMDP class.

Solution Procedures for Partially Observed Markov Decision Processes

Three algorithms to solve the infinite horizon, expected discounted total reward partially observed Markov decision process POMDP, with an appropriately generalized numerical technique that has been shown to reduce CPU time until convergence for the completely observed case.

Value-Function Approximations for Partially Observable Markov Decision Processes

This work surveys various approximation methods, analyzes their properties and relations and provides some new insights into their differences, and presents a number of new approximation methods and novel refinements of existing techniques.

Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes

It is found that incremental pruning is presently the most efficient exact method for solving POMDPS.

Efficient dynamic-programming updates in partially observable Markov decision processes

A new algorithm is offered, called the witness algorithm, which can compute updated value functions efficiently on a restricted class of POMDPs in which the number of linear facets is not too great and it is found that it is the fastest algorithm over a wide range of PomDP sizes.

A Heuristic Variable Grid Solution Method for POMDPs

A simple variable-grid solution method which yields good results on relatively large problems with modest computational effort is described.

The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs

The paper develops easily implemented approximations to stationary policies based on finitely transient policies and shows that the concave hull of an approximation can be included in the well-known Howard policy improvement algorithm with subsequent convergence.

Solving POMDPs by Searching in Policy Space

An approach to solving POMDPs is presented that represents a policy explicitly as a finite-state controller and iteratively improves the controller by search in policy space, which provides the foundation for a new heuristic search algorithm.
...