Learn More
Classic heuristic search algorithms can find solutions that take the form of a simple path (A*), a tree, or an acyclic graph (AO*). In this paper, we describe a novel generalization of heuristic search, called LAO*, that can find solutions with loops. We show that LAO* can be used to solve Markov decision problems and that it shares the advantage heuristic(More)
We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterative elimination of dominated strategies in normal form games. We prove that it iteratively eliminates very weakly dominated strategies(More)
Most algorithms for solving POMDPs iteratively improve a value function that implicitly represents a policy and are said to search in value function space. This paper presents an approach to solving POMDPs that represents a policy explicitly as a nite-state controller and iteratively improves the controller by search in policy space. Two related algorithms(More)
Recent work shows that the memory requirements of best-first heuristic search can be reduced substantially by using a divide-and-conquer method of solution reconstruction. We show that memory requirements can be reduced even further by using a breadth-first instead of a best-first search strategy. We describe optimal and approximate breadth-first heuristic(More)
We present a bounded policy iteration algorithm for infinite-horizon decentralized POMDPs. Policies are represented as joint stochastic finite-state controllers , which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information(More)
Anytime algorithms offer a tradeoff between solution quality and computation time that has proved useful in solving time-critical problems such as planning and scheduling, belief network evaluation, and information gathering. To exploit this tradeoff, a system must be able to decide when to stop deliberation and act on the currently available solution. This(More)