Corpus ID: 234470158

Online POMDP Planning via Simplification

  title={Online POMDP Planning via Simplification},
  author={Ori Sztyglic and V. Indelman},
In this paper, we consider online planning in partially observable domains. Solving the corresponding POMDP problem is a very challenging task, particularly in an online setting. Our key contribution is a novel algorithmic approach, Simplified Information Theoretic Belief Space Planning (SITHBSP), which aims to speed-up POMDP planning considering belief-dependent rewards, without compromising on the solution’s accuracy. We do so by mathematically relating the simplified elements of the problem… Expand

Figures and Tables from this paper

Simplified Belief-Dependent Reward MCTS Planning with Guaranteed Tree Consistency
This paper presents Simplified Information-Theoretic Particle Filter Tree (SITH-PFT), a novel variant to the MCTS algorithm that considers information-theoretic rewards but avoids the need to calculate them completely. Expand


Multilevel Monte-Carlo for Solving POMDPs Online
Experiments indicate that MLPP substantially outperforms state-of-the-art POMDP solvers, and combines the commonly known Monte-Carlo-Tree-Search with the concept of Multilevel Monte- carlo to speed-up the capability in generating approximately optimal solutions for PomDPs with complex dynamics. Expand
Monte-Carlo Planning in Large POMDPs
POMCP is the first general purpose planner to achieve high performance in such large and unfactored POMDPs as 10 x 10 battleship and partially observable PacMan, with approximately 1018 and 1056 states respectively. Expand
Information Particle Filter Tree: An Online Algorithm for POMDPs with Belief-Based Rewards on Continuous Domains
This work proposes a novel online algorithm, Information Particle Filter Tree (IPFT), to solve problems with belief-dependent rewards on continuous domains and shows that the consideration of information gain greatly improves the performance in problems where information gathering is an essential part of the optimal policy. Expand
DESPOT-α: Online POMDP Planning With Large State And Observation Spaces
State-of-the-art sampling-based online POMDP solvers compute near-optimal policies for POMDPs with very large state spaces. However, when faced with large observation spaces, they may become overlyExpand
rho-POMDPs have Lipschitz-Continuous epsilon-Optimal Value Functions
Many state-of-the-art algorithms for solving Partially Observable Markov Decision Processes (POMDPs) rely on turning the problem into a “fully observable” problem—a belief MDP—and exploiting theExpand
Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces
Two new algorithms, POMCPOW and PFT-DPW, are proposed and evaluated that overcome this deficiency by using weighted particle filtering and Simulation results show that these modifications allow the algorithms to be successful where previous approaches fail. Expand
The Complexity of Markov Decision Processes
All three variants of the classical problem of optimal policy computation in Markov decision processes, finite horizon, infinite horizon discounted, and infinite horizon average cost are shown to be complete for P, and therefore most likely cannot be solved by highly parallel algorithms. Expand
Heuristic Search Value Iteration for POMDPs
HSVI is an anytime algorithm that returns a policy and a provable bound on its regret with respect to the optimal policy and is applied to a new rover exploration problem 10 times larger than most POMDP problems in the literature. Expand
Sparse tree search optimality guarantees in POMDPs with continuous observation spaces
It is proved that a simplified algorithm, partially observable weighted sparse sampling (POWSS), will estimate Q-values accurately with high probability and can be made to perform arbitrarily near the optimal solution by increasing computational power. Expand
Point-Based POMDP Algorithms: Improved Analysis and Implementation
A new bound for point-based POMDP value iteration algorithms is derived that relies on both and uses the concept of discounted reachability and may help guide future algorithm design. Expand