# Scaling Up Optimal Heuristic Search in Dec-POMDPs via Incremental Expansion

@inproceedings{Spaan2011ScalingUO, title={Scaling Up Optimal Heuristic Search in Dec-POMDPs via Incremental Expansion}, author={Matthijs T. J. Spaan and Frans A. Oliehoek and Chris Amato}, booktitle={IJCAI}, year={2011} }

Planning under uncertainty for multiagent systems can be formalized as a decentralized partially observable Markov decision process. We advance the state of the art for optimal solution of this model, building on the Multiagent A* heuristic search method. A key insight is that we can avoid the full expansion of a search node that generates a number of children that is doubly exponential in the node's depth. Instead, we incrementally expand the children only when a next child might have the… Expand

#### 52 Citations

Incremental clustering and expansion for faster optimal planning in decentralized POMDPs

- Mathematics
- 2013

This article presents the state-of-the-art in optimal solution methods for decentralized partially observable Markov decision processes (Dec-POMDPs), which are general models for collaborative… Expand

Incremental Clustering and Expansion for Faster Optimal Planning in Dec-POMDPs

- Mathematics, Computer Science
- J. Artif. Intell. Res.
- 2013

This article presents the state-of-the-art in optimal solution methods for decentralized partially observable Markov decision processes (Dec-POMDPs), which are general models for collaborative multiagent planning under uncertainty, and presents extensive empirical results demonstrating that GMAA*-ICE, an algorithm that synthesizes these advances, can optimally solve Dec-PomDPs of unprecedented size. Expand

Scaling Up Decentralized MDPs Through Heuristic Search

- Mathematics, Computer Science
- UAI
- 2012

An updated proof that an optimal policy does not depend on the histories of the agents, but only the local observations is provided and a new algorithm based on heuristic search that is able to expand search nodes by using constraint optimization is presented. Expand

Heuristic search of multiagent influence space

- Computer Science
- AAMAS
- 2012

The logical albeit nontrivial next step of combining multiagent A* search and influence-based abstraction into a single algorithm is taken, and empirical results indicate that A* can provide significant computational savings on top of those already afforded by influence-space search. Expand

Heuristic Search of Multiagent Influence Space Citation

- 2011

Two techniques have substantially advanced efficiency and scalability of multiagent planning. First, heuristic search gains traction by pruning large portions of the joint policy space. Second,… Expand

Solving Multi-agent MDPs Optimally with Conditional Return Graphs

- Computer Science
- 2015

This work proposes CoRe, a novel branch-and-bound policy search algorithm building on CRGs, which typically requires less runtime than the avail- able alternatives and is able to find solutions to problems previously considered unsolvable. Expand

Multi-Agent Planning under Uncertainty with Monte Carlo Q-Value Function

- Engineering
- 2019

Decentralized partially observable Markov decision processes (Dec-POMDPs) are general multi-agent models for planning under uncertainty, but are intractable to solve. Doubly exponential growth of the… Expand

Optimally Solving Dec-POMDPs as Continuous-State MDPs

- Mathematics, Computer Science
- IJCAI
- 2013

The idea of transforming a Dec-POMDP into a continuous-state deterministic MDP with a piecewise-linear and convex value function is introduced, and a feature-based heuristic search that relies on feature- based compact representations, point-based updates and efficient action selection is introduced. Expand

Accelerated Vector Pruning for Optimal POMDP Solvers

- Computer Science
- AAAI
- 2017

This paper shows how the LPs in POMDP pruning subroutines can be decomposed using a Benders decomposition and shows that the resulting algorithm incrementally adds LP constraints and uses only a small fraction of the constraints. Expand

Producing efficient error-bounded solutions for transition independent decentralized mdps

- Computer Science
- AAMAS
- 2013

This paper presents the first approach for solving transition independent decentralized Markov decision processes (Dec-MDPs), that inherits error-bounds and fast convergence rates and provides the foundation for the first algorithm for solving infinite-horizon transitionindependent decentralized MDPs. Expand

#### References

SHOWING 1-10 OF 23 REFERENCES

Memory-Bounded Dynamic Programming for DEC-POMDPs

- Mathematics, Computer Science
- IJCAI
- 2007

This work presents the first memory-bounded dynamic programming algorithm for finite-horizon decentralized POMDPs, which can handle horizons that are multiple orders of magnitude larger than what was previously possible, while achieving the same or better solution quality. Expand

MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs

- Computer Science
- UAI
- 2005

This work presents multi-agent A* (MAA*), the first complete and optimal heuristic search algorithm for solving decentralized partially-observable Markov decision problems (DECPOMDPs) with finite horizon, and introduces an anytime variant of MAA*. Expand

Incremental Policy Generation for Finite-Horizon DEC-POMDPs

- Computer Science
- ICAPS
- 2009

A new backup algorithm is proposed that is based on a reachability analysis of the state space that can be used to produce an optimal solution for any possible initial state or further scalability can be achieved by making use of a known start state. Expand

Lossless clustering of histories in decentralized POMDPs

- Computer Science
- AAMAS
- 2009

This work proves that when two histories satisfy the criterion, they have the same optimal value and thus can be treated as one, and demonstrates empirically that it can provide a speed-up of multiple orders of magnitude, allowing the optimal solution of significantly larger problems. Expand

Approximate solutions for partially observable stochastic games with common payoffs

- Computer Science
- Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004.
- 2004

This work proposes an algorithm that approximates POSGs as a series of smaller, related Bayesian games, using heuristics such as QMDP to provide the future discounted value of actions, and results in policies that are locally optimal with respect to the selected heuristic. Expand

Online planning for multi-agent systems with bounded communication

- Computer Science
- Artif. Intell.
- 2011

An online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs that can solve problems that are too large for the best existing offline planning algorithms and it outperforms the best online method, producing much higher value with much less communication in most cases. Expand

Heuristic search for identical payoff Bayesian games

- Computer Science
- AAMAS
- 2010

A branch and bound algorithm that optimally solves identical payoff Bayesian games for coordinating teams of cooperative agents and shows a marked improvement over previous methods, obtaining speedups of up to 3 orders of magnitude for synthetic random games, and reaching 10 order of magnitude speedups for games in a DEC-POMDP context. Expand

Point-based backup for decentralized POMDPs: complexity and new algorithms

- Computer Science
- AAMAS
- 2010

The optimal algorithm exploits recent advances in the weighted CSP literature to overcome the complexity of the backup operation and the polytime approximation scheme provides a constant factor approximation guarantee based on the number of belief points. Expand

Formal models and algorithms for decentralized decision making under uncertainty

- Computer Science
- Autonomous Agents and Multi-Agent Systems
- 2007

Five different formal frameworks, three different optimal algorithms, as well as a series of approximation techniques are analyzed to provide interesting insights into the structure of decentralized problems, the expressiveness of the various models, and the relative advantages and limitations of the different solution techniques. Expand

Learning Policies for Partially Observable Environments: Scaling Up

- Mathematics, Computer Science
- ICML
- 1995

This paper discusses several simple solution methods and shows that all are capable of finding near- optimal policies for a selection of extremely small POMDP'S taken from the learning literature, but shows that none are able to solve a slightly larger and noisier problem based on robot navigation. Expand