# Optimal and Approximate Q-value Functions for Decentralized POMDPs

@article{Oliehoek2008OptimalAA, title={Optimal and Approximate Q-value Functions for Decentralized POMDPs}, author={F. Oliehoek and M. Spaan and N. Vlassis}, journal={J. Artif. Intell. Res.}, year={2008}, volume={32}, pages={289-353} }

Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q* is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q*. In this paper we study whether similar Q-value functions can be defined for… Expand

#### Figures, Tables, and Topics from this paper

#### 221 Citations

Multi-Agent Planning under Uncertainty with Monte Carlo Q-Value Function

- Engineering
- 2019

Decentralized partially observable Markov decision processes (Dec-POMDPs) are general multi-agent models for planning under uncertainty, but are intractable to solve. Doubly exponential growth of the… Expand

Optimally Solving Dec-POMDPs as Continuous-State MDPs

- Mathematics, Computer Science
- IJCAI
- 2013

The idea of transforming a Dec-POMDP into a continuous-state deterministic MDP with a piecewise-linear and convex value function is introduced, and a feature-based heuristic search that relies on feature- based compact representations, point-based updates and efficient action selection is introduced. Expand

An Investigation into Mathematical Programming for Finite Horizon Decentralized POMDPs

- Computer Science
- J. Artif. Intell. Res.
- 2010

This paper studies an alternate formulation of DEC-POMDPs relying on a sequence-form representation of policies and shows how to derive Mixed Integer Linear Programming (MILP) problems that, once solved, give exact optimal solutions to thedecentralized Partially Observable Markov Decision Processes. Expand

Mixed Integer Linear Programming for Exact Finite-Horizon Planning in Decentralized Pomdps

- Computer Science, Mathematics
- ICAPS
- 2007

This paper represents each agent's policy in the sequence-form and not in the tree-form, thereby obtaining a very compact representation of the set of joint-policies and solves the problem of finding an n-agent joint-policy for the optimal finite-horizon control of a decentralized Pomdp. Expand

Using linear programming duality for solving finite horizon Dec-POMDPs

- Mathematics
- 2008

This paper studies the problem of finding an optimal finite horizon joint policy for a decentralized partially observable Markov decision process (Dec-POMDP). We present a new algorithm for finding… Expand

Sufficient Plan-Time Statistics for Decentralized POMDPs

- Computer Science
- IJCAI
- 2013

This paper makes a contribution to the theory of decentralized POMDPs by showing how this dependence on the 'past joint policy' can be replaced by a sufficient statistic, and the results are extended to the case of k-step delayed communication. Expand

Mathematical programming methods for decentralized POMDPs

- Computer Science
- 2008

A new mathematical programming based approach for exactly solving a finite horizon DEC-POMDP using the sequence form of a control policy in this approach and shows how the problem can be formulated as a mathematical progam with a nonlinear object and linear constraints. Expand

Lossless clustering of histories in decentralized POMDPs

- Computer Science
- AAMAS
- 2009

This work proves that when two histories satisfy the criterion, they have the same optimal value and thus can be treated as one, and demonstrates empirically that it can provide a speed-up of multiple orders of magnitude, allowing the optimal solution of significantly larger problems. Expand

Decentralized POMDPs

- Computer Science
- Reinforcement Learning
- 2012

This chapter presents an overview of the decentralized POMDP (Dec-POMDP) framework, and covers the forward heuristic search approach to solving Dec-PomDPs, as well as the backward dynamic programming approach. Expand

Fuzzy reinforcement learning control for decentralized partially observable Markov decision processes

- Mathematics, Computer Science
- 2011 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011)
- 2011

The main contributions of the work are the introduction of a game based RL paradigm in a Dec-POMDP settings, and the use of fuzzy inference systems to effectively generalize the underlying belief space. Expand

#### References

SHOWING 1-10 OF 95 REFERENCES

Q-value functions for decentralized POMDPs

- Computer Science
- AAMAS '07
- 2007

It is argued that searching for the optimal Q- value function may be as costly as exhaustive policy search, and various approximate Q-value functions that allow efficient computation are analyzed. Expand

A heuristic approach for solving decentralized-POMDP: assessment on the pursuit problem

- Computer Science
- SAC '02
- 2002

This paper proposes an heuristic approach for solving Decentralized Partially Observable Markov Decision Processes when agents are memory-less and when the global reward function can be broken up into a sum of local reward functions. Expand

Mixed Integer Linear Programming for Exact Finite-Horizon Planning in Decentralized Pomdps

- Computer Science, Mathematics
- ICAPS
- 2007

This paper represents each agent's policy in the sequence-form and not in the tree-form, thereby obtaining a very compact representation of the set of joint-policies and solves the problem of finding an n-agent joint-policy for the optimal finite-horizon control of a decentralized Pomdp. Expand

Communications for improving policy computation in distributed POMDPs

- Computer Science
- Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004.
- 2004

This paper shows how communicative acts can be explicitly introduced in order to find locally optimal joint policies that allow agents to coordinate better through synchronization achieved via communication and develops a novel compact policy representation that results in savings of both space and time. Expand

Dec-POMDPs with delayed communication

- Mathematics
- 2007

In this work we consider the problem of multiagent planning under sensing and acting uncertainty with a one time-step delay in communication. We adopt decentralized partially observable Markov… Expand

Complexity analysis and optimal algorithms for decentralized decision making

- Mathematics
- 2005

Coordination of distributed entities is required for problems arising in many areas, including multi-robot systems, networking applications, e-commerce applications, and the control of autonomous… Expand

Winning back the CUP for distributed POMDPs: planning over continuous belief spaces

- Computer Science
- AAMAS '06
- 2006

A novel algorithm is provided to explicitly compute finite horizon policies over continuous belief spaces, without restricting the space of policies, and locally optimal joint policies are obtained. Expand

A Cross-Entropy Approach to Solving Dec-POMDPs

- Mathematics, Computer Science
- IDC
- 2007

This paper focuses on the decentralized POMDP (Dec-POMDP) model for multiagent planning under uncertainty, and focuses on finding a set of optimal policies for the agents that maximize the expected shared reward. Expand

Decentralized planning under uncertainty for teams of communicating agents

- Computer Science
- AAMAS '06
- 2006

This work explores iterative methods for approximately solving decentralized Markov decision processes, and model communication as an integral part of the agent's reasoning, in which the meaning of a message is directly encoded in the policy of the communicating agent. Expand

Perseus: Randomized Point-based Value Iteration for POMDPs

- Mathematics, Computer Science
- J. Artif. Intell. Res.
- 2005

This work presents a randomized point-based value iteration algorithm called PERSEUS, which backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. Expand