# Hierarchical POMDP Controller Optimization by Likelihood Maximization

@inproceedings{Toussaint2008HierarchicalPC, title={Hierarchical POMDP Controller Optimization by Likelihood Maximization}, author={Marc Toussaint and Laurent Charlin and Pascal Poupart}, booktitle={UAI}, year={2008} }

Planning can often be simplified by decomposing the task into smaller tasks arranged hierarchically. Charlin et al. [4] recently showed that the hierarchy discovery problem can be framed as a non-convex optimization problem. However, the inherent computational difficulty of solving such an optimization problem makes it hard to scale to real-world problems. In another line of research, Toussaint et al. [18] developed a method to solve planning problems by maximum-likelihood estimation. In this… Expand

#### Supplemental Video

#### 91 Citations

POMDP Planning by Marginal-MAP Probabilistic Inference in Generative Models

- 2014

While most current POMDP planning methods have focused on the development of scalable approximate algorithms, they often neglect the important aspect of solution quality and sacrifice performance… Expand

Chapter 1 Expectation-Maximization methods for solving ( PO ) MDPs and optimal control problems

- 2009

As this book demonstrates, the development of efficient probabilistic inference techniques has made considerable progress in recent years, in particular with respect to exploiting the structure… Expand

Policy optimization by marginal-map probabilistic inference in generative models

- Computer Science
- AAMAS
- 2014

This work re-formulates POMDP planning as a task of marginal-MAP “mix” (max-sum) inference with respect to a new single-DBN generative model and defines a dual representation of the MMAP problem and derives a Bayesian variational approximation framework with an upper bound. Expand

Hierarchical Monte-Carlo Planning

- Computer Science
- AAAI
- 2015

This work proposes novel, scalable MCTS methods which integrate a task hierarchy into the M CTS framework, specifically leading to hierarchical versions of both, UCT and POMCP. Expand

Analyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains

- Mathematics, Computer Science
- ECML/PKDD
- 2011

This work investigates the local optima of finite state controllers in single agent partially observable Markov decision processes (POMDPs) that are optimized by expectation maximization (EM), and proposes two algorithms that can robustly escape localoptima. Expand

A Novel Single-DBN Generative Model for Optimizing POMDP Controllers by Probabilistic Inference

- Computer Science
- AAAI
- 2014

A novel single-DBN generative model is designed that ensures that the task of probabilistic inference is equivalent to the original problem of optimizing POMDP controllers, and several inference approaches are developed to approximate the value of the policy when exact inference methods are not tractable to solve large-size problems with complex graphical models. Expand

Anytime Planning for Decentralized POMDPs using Expectation Maximization

- Computer Science
- UAI
- 2010

This work presents a promising new class of algorithms for the infinite-horizon case, which recasts the optimization problem as inference in a mixture of DBNs and derives the Expectation Maximization (EM) algorithm to optimize the joint policy represented asDBNs. Expand

Scalable Multiagent Planning Using Probabilistic Inference

- Computer Science
- IJCAI
- 2011

This work identifies certain mild conditions that are sufficient to make multiagent planning amenable to a scalable approximation w.r.t. the number of agents and derives a global update rule that combines these local inferences to monotonically increase the overall solution quality. Expand

Bayesian Time Series Models: Expectation maximisation methods for solving (PO)MDPs and optimal control problems

- Mathematics, Computer Science
- 2011

This chapter shows that efficient probabilistic inference techniques can be used also for solving Markov Decision Processes or partial observable MDPs when formulated in terms of a structured dynamic Bayesian network (DBN). Expand

Leveraging Task Knowledge for Robot Motion Planning Under Uncertainty

- Computer Science
- 2018

This work proposes a hierarchical POMDP planner that develops locally optimal motion plans for hybrid dynamics models and evaluates the proposed planner for two navigation and localization tasks in simulated domains, as well as an assembly task with a real robotic manipulator. Expand

#### References

SHOWING 1-10 OF 19 REFERENCES

Automated Hierarchy Discovery for Planning in Partially Observable Environments

- Mathematics, Computer Science
- NIPS
- 2006

This paper frames the optimization of a hierarchical policy as a non-convex optimization problem that can be solved with general non- linear solvers, a mixed-integer non-linear approximation or a form of bounded hierarchical policy iteration. Expand

Solving POMDPs using quadratically constrained linear programs

- Computer Science, Mathematics
- AAMAS '06
- 2006

This work describes a new approach that addresses the space requirement of POMDP algorithms while maintaining well-defined optimality guarantees. Expand

Tractable planning under uncertainty: exploiting structure

- Engineering
- 2004

The problem of planning under uncertainty has received significant attention in the scientific community over the past few years. It is now well-recognized that considering uncertainty during… Expand

Stochastic Local Search for POMDP Controllers

- Computer Science
- AAAI
- 2004

The heuristics used in this procedure mimic the sequential reasoning inherent in optimal dynamic programming (DP) approaches and are competitive with (and, for some problems, superior to) other state-of-the-art controller and DP-based algorithms on large-scale POMDPs. Expand

Probabilistic inference for solving (PO) MDPs

- Computer Science
- 2006

The approach is based on an equivalence between maximization of the expected future return in the time-unlimited MDP and likelihood maximization in a related mixture of finite-time MDPs, which allows to use expectation maximization (EM) for computing optimal policies, using arbitrary inference techniques in the E-step. Expand

Representing hierarchical POMDPs as DBNs for multi-scale robot localization

- Computer Science
- IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004
- 2004

We explore the advantages of representing hierarchical partially observable Markov decision processes (H-POMDPs) as dynamic Bayesian networks (DBNs). In particular, we focus on the special case of… Expand

Synthesis of Hierarchical Finite-State Controllers for POMDPs

- Computer Science, Mathematics
- ICAPS
- 2003

A planning algorithm is described that uses a programmer-defined task hierarchy to constrain the search space of finite-state controllers, and it is proved that this algorithm converges to a hierarchical finite- state controller that is e-optimal in a limited but well-defined sense, related to the concept of recursive optimality. Expand

Heuristic Search Value Iteration for POMDPs

- Computer Science, Mathematics
- UAI
- 2004

HSVI is an anytime algorithm that returns a policy and a provable bound on its regret with respect to the optimal policy and is applied to a new rover exploration problem 10 times larger than most POMDP problems in the literature. Expand

Exact and approximate algorithms for partially observable markov decision processes

- Mathematics
- 1998

Automated sequential decision making is crucial in many contexts. In the face of uncertainty, this task becomes even more important, though at the same time, computing optimal decision policies… Expand

An Improved Policy Iteration Algorithm for Partially Observable MDPs

- Mathematics, Computer Science
- NIPS
- 1997

The paper's contribution is to show that the dynamic-programming update used in the policy improvement step can be interpreted as the transformation of a finite-state controller into an improved finite- state controller. Expand