Corpus ID: 1453824

Hierarchical POMDP Controller Optimization by Likelihood Maximization

  title={Hierarchical POMDP Controller Optimization by Likelihood Maximization},
  author={Marc Toussaint and Laurent Charlin and Pascal Poupart},
Planning can often be simplified by decomposing the task into smaller tasks arranged hierarchically. Charlin et al. [4] recently showed that the hierarchy discovery problem can be framed as a non-convex optimization problem. However, the inherent computational difficulty of solving such an optimization problem makes it hard to scale to real-world problems. In another line of research, Toussaint et al. [18] developed a method to solve planning problems by maximum-likelihood estimation. In this… Expand
POMDP Planning by Marginal-MAP Probabilistic Inference in Generative Models
While most current POMDP planning methods have focused on the development of scalable approximate algorithms, they often neglect the important aspect of solution quality and sacrifice performanceExpand
Chapter 1 Expectation-Maximization methods for solving ( PO ) MDPs and optimal control problems
As this book demonstrates, the development of efficient probabilistic inference techniques has made considerable progress in recent years, in particular with respect to exploiting the structureExpand
Policy optimization by marginal-map probabilistic inference in generative models
This work re-formulates POMDP planning as a task of marginal-MAP “mix” (max-sum) inference with respect to a new single-DBN generative model and defines a dual representation of the MMAP problem and derives a Bayesian variational approximation framework with an upper bound. Expand
Hierarchical Monte-Carlo Planning
This work proposes novel, scalable MCTS methods which integrate a task hierarchy into the M CTS framework, specifically leading to hierarchical versions of both, UCT and POMCP. Expand
Analyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains
This work investigates the local optima of finite state controllers in single agent partially observable Markov decision processes (POMDPs) that are optimized by expectation maximization (EM), and proposes two algorithms that can robustly escape localoptima. Expand
A Novel Single-DBN Generative Model for Optimizing POMDP Controllers by Probabilistic Inference
A novel single-DBN generative model is designed that ensures that the task of probabilistic inference is equivalent to the original problem of optimizing POMDP controllers, and several inference approaches are developed to approximate the value of the policy when exact inference methods are not tractable to solve large-size problems with complex graphical models. Expand
Anytime Planning for Decentralized POMDPs using Expectation Maximization
This work presents a promising new class of algorithms for the infinite-horizon case, which recasts the optimization problem as inference in a mixture of DBNs and derives the Expectation Maximization (EM) algorithm to optimize the joint policy represented asDBNs. Expand
Scalable Multiagent Planning Using Probabilistic Inference
This work identifies certain mild conditions that are sufficient to make multiagent planning amenable to a scalable approximation w.r.t. the number of agents and derives a global update rule that combines these local inferences to monotonically increase the overall solution quality. Expand
Bayesian Time Series Models: Expectation maximisation methods for solving (PO)MDPs and optimal control problems
This chapter shows that efficient probabilistic inference techniques can be used also for solving Markov Decision Processes or partial observable MDPs when formulated in terms of a structured dynamic Bayesian network (DBN). Expand
Leveraging Task Knowledge for Robot Motion Planning Under Uncertainty
This work proposes a hierarchical POMDP planner that develops locally optimal motion plans for hybrid dynamics models and evaluates the proposed planner for two navigation and localization tasks in simulated domains, as well as an assembly task with a real robotic manipulator. Expand


Automated Hierarchy Discovery for Planning in Partially Observable Environments
This paper frames the optimization of a hierarchical policy as a non-convex optimization problem that can be solved with general non- linear solvers, a mixed-integer non-linear approximation or a form of bounded hierarchical policy iteration. Expand
Solving POMDPs using quadratically constrained linear programs
This work describes a new approach that addresses the space requirement of POMDP algorithms while maintaining well-defined optimality guarantees. Expand
Tractable planning under uncertainty: exploiting structure
The problem of planning under uncertainty has received significant attention in the scientific community over the past few years. It is now well-recognized that considering uncertainty duringExpand
Stochastic Local Search for POMDP Controllers
The heuristics used in this procedure mimic the sequential reasoning inherent in optimal dynamic programming (DP) approaches and are competitive with (and, for some problems, superior to) other state-of-the-art controller and DP-based algorithms on large-scale POMDPs. Expand
Probabilistic inference for solving (PO) MDPs
The approach is based on an equivalence between maximization of the expected future return in the time-unlimited MDP and likelihood maximization in a related mixture of finite-time MDPs, which allows to use expectation maximization (EM) for computing optimal policies, using arbitrary inference techniques in the E-step. Expand
Representing hierarchical POMDPs as DBNs for multi-scale robot localization
We explore the advantages of representing hierarchical partially observable Markov decision processes (H-POMDPs) as dynamic Bayesian networks (DBNs). In particular, we focus on the special case ofExpand
Synthesis of Hierarchical Finite-State Controllers for POMDPs
A planning algorithm is described that uses a programmer-defined task hierarchy to constrain the search space of finite-state controllers, and it is proved that this algorithm converges to a hierarchical finite- state controller that is e-optimal in a limited but well-defined sense, related to the concept of recursive optimality. Expand
Heuristic Search Value Iteration for POMDPs
HSVI is an anytime algorithm that returns a policy and a provable bound on its regret with respect to the optimal policy and is applied to a new rover exploration problem 10 times larger than most POMDP problems in the literature. Expand
Exact and approximate algorithms for partially observable markov decision processes
Automated sequential decision making is crucial in many contexts. In the face of uncertainty, this task becomes even more important, though at the same time, computing optimal decision policiesExpand
An Improved Policy Iteration Algorithm for Partially Observable MDPs
  • E. Hansen
  • Mathematics, Computer Science
  • NIPS
  • 1997
The paper's contribution is to show that the dynamic-programming update used in the policy improvement step can be interpreted as the transformation of a finite-state controller into an improved finite- state controller. Expand