Optimal and Approximate Q-value Functions for Decentralized POMDPs

@article{Oliehoek2008OptimalAA,
  title={Optimal and Approximate Q-value Functions for Decentralized POMDPs},
  author={F. Oliehoek and M. Spaan and N. Vlassis},
  journal={J. Artif. Intell. Res.},
  year={2008},
  volume={32},
  pages={289-353}
}
Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q* is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q*. In this paper we study whether similar Q-value functions can be defined for… Expand
Multi-Agent Planning under Uncertainty with Monte Carlo Q-Value Function
Decentralized partially observable Markov decision processes (Dec-POMDPs) are general multi-agent models for planning under uncertainty, but are intractable to solve. Doubly exponential growth of theExpand
Optimally Solving Dec-POMDPs as Continuous-State MDPs
TLDR
The idea of transforming a Dec-POMDP into a continuous-state deterministic MDP with a piecewise-linear and convex value function is introduced, and a feature-based heuristic search that relies on feature- based compact representations, point-based updates and efficient action selection is introduced. Expand
An Investigation into Mathematical Programming for Finite Horizon Decentralized POMDPs
TLDR
This paper studies an alternate formulation of DEC-POMDPs relying on a sequence-form representation of policies and shows how to derive Mixed Integer Linear Programming (MILP) problems that, once solved, give exact optimal solutions to thedecentralized Partially Observable Markov Decision Processes. Expand
Mixed Integer Linear Programming for Exact Finite-Horizon Planning in Decentralized Pomdps
TLDR
This paper represents each agent's policy in the sequence-form and not in the tree-form, thereby obtaining a very compact representation of the set of joint-policies and solves the problem of finding an n-agent joint-policy for the optimal finite-horizon control of a decentralized Pomdp. Expand
Using linear programming duality for solving finite horizon Dec-POMDPs
This paper studies the problem of finding an optimal finite horizon joint policy for a decentralized partially observable Markov decision process (Dec-POMDP). We present a new algorithm for findingExpand
Sufficient Plan-Time Statistics for Decentralized POMDPs
TLDR
This paper makes a contribution to the theory of decentralized POMDPs by showing how this dependence on the 'past joint policy' can be replaced by a sufficient statistic, and the results are extended to the case of k-step delayed communication. Expand
Mathematical programming methods for decentralized POMDPs
TLDR
A new mathematical programming based approach for exactly solving a finite horizon DEC-POMDP using the sequence form of a control policy in this approach and shows how the problem can be formulated as a mathematical progam with a nonlinear object and linear constraints. Expand
Lossless clustering of histories in decentralized POMDPs
TLDR
This work proves that when two histories satisfy the criterion, they have the same optimal value and thus can be treated as one, and demonstrates empirically that it can provide a speed-up of multiple orders of magnitude, allowing the optimal solution of significantly larger problems. Expand
Decentralized POMDPs
TLDR
This chapter presents an overview of the decentralized POMDP (Dec-POMDP) framework, and covers the forward heuristic search approach to solving Dec-PomDPs, as well as the backward dynamic programming approach. Expand
Fuzzy reinforcement learning control for decentralized partially observable Markov decision processes
  • Rajneesh Sharma, M. Spaan
  • Mathematics, Computer Science
  • 2011 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011)
  • 2011
TLDR
The main contributions of the work are the introduction of a game based RL paradigm in a Dec-POMDP settings, and the use of fuzzy inference systems to effectively generalize the underlying belief space. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 95 REFERENCES
Q-value functions for decentralized POMDPs
TLDR
It is argued that searching for the optimal Q- value function may be as costly as exhaustive policy search, and various approximate Q-value functions that allow efficient computation are analyzed. Expand
A heuristic approach for solving decentralized-POMDP: assessment on the pursuit problem
TLDR
This paper proposes an heuristic approach for solving Decentralized Partially Observable Markov Decision Processes when agents are memory-less and when the global reward function can be broken up into a sum of local reward functions. Expand
Mixed Integer Linear Programming for Exact Finite-Horizon Planning in Decentralized Pomdps
TLDR
This paper represents each agent's policy in the sequence-form and not in the tree-form, thereby obtaining a very compact representation of the set of joint-policies and solves the problem of finding an n-agent joint-policy for the optimal finite-horizon control of a decentralized Pomdp. Expand
Communications for improving policy computation in distributed POMDPs
TLDR
This paper shows how communicative acts can be explicitly introduced in order to find locally optimal joint policies that allow agents to coordinate better through synchronization achieved via communication and develops a novel compact policy representation that results in savings of both space and time. Expand
Dec-POMDPs with delayed communication
In this work we consider the problem of multiagent planning under sensing and acting uncertainty with a one time-step delay in communication. We adopt decentralized partially observable MarkovExpand
Complexity analysis and optimal algorithms for decentralized decision making
Coordination of distributed entities is required for problems arising in many areas, including multi-robot systems, networking applications, e-commerce applications, and the control of autonomousExpand
Winning back the CUP for distributed POMDPs: planning over continuous belief spaces
TLDR
A novel algorithm is provided to explicitly compute finite horizon policies over continuous belief spaces, without restricting the space of policies, and locally optimal joint policies are obtained. Expand
A Cross-Entropy Approach to Solving Dec-POMDPs
TLDR
This paper focuses on the decentralized POMDP (Dec-POMDP) model for multiagent planning under uncertainty, and focuses on finding a set of optimal policies for the agents that maximize the expected shared reward. Expand
Decentralized planning under uncertainty for teams of communicating agents
TLDR
This work explores iterative methods for approximately solving decentralized Markov decision processes, and model communication as an integral part of the agent's reasoning, in which the meaning of a message is directly encoded in the policy of the communicating agent. Expand
Perseus: Randomized Point-based Value Iteration for POMDPs
TLDR
This work presents a randomized point-based value iteration algorithm called PERSEUS, which backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. Expand
...
1
2
3
4
5
...