Corpus ID: 221670628

Information Particle Filter Tree: An Online Algorithm for POMDPs with Belief-Based Rewards on Continuous Domains

  title={Information Particle Filter Tree: An Online Algorithm for POMDPs with Belief-Based Rewards on Continuous Domains},
  author={J. Fischer and {\"O}mer Sahin Tas},
Planning in Partially Observable Markov Decision Processes (POMDPs) inherently gathers the information necessary to act optimally under uncertainties. The framework can be extended to model pure information gathering tasks by considering belief-based rewards. This allows us to use reward shaping to guide POMDP planning to informative beliefs by using a weighted combination of the original reward and the expected information gain as the objective. In this work we propose a novel online algorithm… Expand

Figures and Tables from this paper

Simplified Belief-Dependent Reward MCTS Planning with Guaranteed Tree Consistency
This paper presents Simplified Information-Theoretic Particle Filter Tree (SITH-PFT), a novel variant to the MCTS algorithm that considers information-theoretic rewards but avoids the need to calculate them completely. Expand
Online POMDP Planning via Simplification
A novel algorithmic approach, Simplified Information Theoretic Belief Space Planning (SITHBSP), which aims to speed-up POMDP planning considering belief-dependent rewards, without compromising on the solution's accuracy, by mathematically relating the simplified elements of the problem to the corresponding counterparts of the original problem. Expand
Probabilistic Loss and its Online Characterization for Simplified Decision Making Under Uncertainty
This work extends the decision making mechanism to the whole by removing standard approximations and considering all previously suppressed stochastic sources of variability and presents a novel framework to simplify decision making while assessing and controlling online the simplification’s impact. Expand


Potential-based reward shaping for finite horizon online POMDP planning
The problem of suboptimal behavior during online partially observable Markov decision process (POMDP) planning caused by time constraints on planning is addressed, and potential-based reward shaping (PBRS) is extended from RL to online POMDP planning, enabling the agent to save time to improve the breadth planning and build higher quality plans. Expand
Decision-theoretic planning under uncertainty with information rewards for active cooperative perception
This work presents the POMDP with Information Rewards (POMDP-IR) modeling framework, which rewards an agent for reaching a certain level of belief regarding a state feature, and demonstrates their use for active cooperative perception scenarios. Expand
Information Gathering and Reward Exploitation of Subgoals for POMDPs
Experimental results show that IGRES is an effective multi-purpose POMDP solver, providing state-of-the-art performance for both long horizon planning tasks and information-gathering tasks on benchmark domains. Expand
Monte-Carlo Planning in Large POMDPs
POMCP is the first general purpose planner to achieve high performance in such large and unfactored POMDPs as 10 x 10 battleship and partially observable PacMan, with approximately 1018 and 1056 states respectively. Expand
Information theoretic reward shaping for curiosity driven learning in POMDPs
This paper proposes a mechanism for speeding up RL in POMDPs by using an information-based shaping reward, which can be automatically derived from the belief distribution, and shows that the curiosity reward significantly speeds up learning and improves the quality of policies compared to those that use only the extrinsic, task-specific reward signal. Expand
An Online POMDP Solver for Uncertainty Planning in Dynamic Environment
A new online POMDP solver, called Adaptive Belief Tree (ABT), that can reuse and improve existing solution, and update the solution as needed whenever the POM DP model changes, and converges to the optimal solution of the current PomDP model in probability. Expand
DESPOT-α: Online POMDP Planning With Large State And Observation Spaces
State-of-the-art sampling-based online POMDP solvers compute near-optimal policies for POMDPs with very large state spaces. However, when faced with large observation spaces, they may become overlyExpand
Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces
Two new algorithms, POMCPOW and PFT-DPW, are proposed and evaluated that overcome this deficiency by using weighted particle filtering and Simulation results show that these modifications allow the algorithms to be successful where previous approaches fail. Expand
A POMDP Extension with Belief-dependent Rewards
Partially Observable Markov Decision Processes (POMDPs) model sequential decision-making problems under uncertainty and partial observability. Unfortunately, some problems cannot be modeled withExpand
PUMA: Planning Under Uncertainty with Macro-Actions
This paper presents a POMDP algorithm for planning under uncertainty with macro-actions (PUMA) that automatically constructs and evaluates open-loop macro- actions within forward-search planning, and shows how to incrementally refine the plan over time, resulting in an anytime algorithm that provably converges to an ∊-optimal policy. Expand