Partially Observable Markov Decision Processes

@inproceedings{Spaan2012PartiallyOM,
  title={Partially Observable Markov Decision Processes},
  author={M. Spaan},
  booktitle={Reinforcement Learning},
  year={2012}
}
  • M. Spaan
  • Published in Reinforcement Learning 2012
  • Computer Science
For reinforcement learning in environments in which an agent has access to a reliable state signal, methods based on the Markov decision process (MDP) have had many successes. [...] Key Method Next, we give a review of model-based techniques for policy computation, followed by an overview of the available model-free methods for POMDPs. We conclude by highlighting recent trends in POMDP reinforcement learning.Expand
Decision-theoretic planning under uncertainty with information rewards for active cooperative perception
TLDR
This work presents the POMDP with Information Rewards (POMDP-IR) modeling framework, which rewards an agent for reaching a certain level of belief regarding a state feature, and demonstrates their use for active cooperative perception scenarios. Expand
Exploiting submodular value functions for scaling up active perception
TLDR
G greedy point-based value iteration (PBVI) is proposed, a new POMDP planning method that uses greedy maximization to greatly improve scalability in the action space of an active perception PomDP. Expand
Off-Policy Evaluation in Partially Observable Environments
TLDR
A model in which observed and unobserved variables are decoupled into two dynamic processes, called a Decoupled POMDP is formulated, which shows how off-policy evaluation can be performed under this new model, mitigating estimation errors inherent to general PomDPs. Expand
Sufficient Plan-Time Statistics for Decentralized POMDPs
TLDR
This paper makes a contribution to the theory of decentralized POMDPs by showing how this dependence on the 'past joint policy' can be replaced by a sufficient statistic, and the results are extended to the case of k-step delayed communication. Expand
Robot Planning with Constrained Markov Decision Processes
TLDR
This dissertation proposes a hierarchical approach that significantly reduces the computational time of solving a CMDP instance while preserving the existence of a valid solution, and presents a planner that finds a plan to satisfy multiple tasks with given probabilities while having various constraints on its cost functions. Expand
Modeling Biological Agents Beyond the Reinforcement-learning Paradigm
TLDR
This paper reviews two interaction-driven tasks: the AB and AABB task, and implements a non-Markov Reinforcement-Learning (RL) algorithm based upon historical sequences and Q-learning that supports the constructivist paradigm for modeling biological agents. Expand
Sequential Action and Beliefs Under Partially Observable DSGE Environments
This paper introduces a classification of DSGEs from a Markovian perspective, and positions the class of Partially Observable Markov Decision Process (POMDP) to the center of a generalization ofExpand
Degradation of Performance in Reinforcement Learning with State Measurement Uncertainty
TLDR
The purpose of the research was to assess the applicability of reinforcement learning to real world applications of self-protection of military platforms, where it is expected that the observed state space is uncertain at best. Expand
Reinforcement Learning in Structured and Partially Observable Environments
TLDR
This work extensively study tree-based methods, a well-popularized method in RL which is also the core to Alpha-Go, a technique to beat the masters of board games such as Go game. Expand
Reinforcement Learning in an Environment Synthetically Augmented with Digital Pheromones
TLDR
This approach was tested against the historical sequence of Somali maritime pirate attacks from 2005 to mid-2012, enabling a set of autonomous agents representing naval vessels to successfully respond to an average of 333 of the 899 pirate attacks, outperforming the historical record of 139 successes. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 164 REFERENCES
Learning Without State-Estimation in Partially Observable Markovian Decision Processes
TLDR
A new framework for learning without state-estimation in POMDPs is developed by including stochastic policies in the search space, and by defining the value or utility of a distribution over states. Expand
Approximating Optimal Policies for Partially Observable Stochastic Domains
TLDR
Smooth Partially Observable Value Approximation (SPOVA) is introduced, a new approximation method that can quickly yield good approximations which can improve over time and can be combined with reinforcement learning meth ods a combination that was very effective in test cases. Expand
Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes
TLDR
This work presents a variation of this algorithm which learns a locally optimal stochastic memoryless policy, discusses its implementation, and demonstrates its viability using four test problems. Expand
Active Learning in Partially Observable Markov Decision Processes
TLDR
Results show good performance of the algorithm even in large problems: the most useful parameters of the model are learned quickly and the agent still accumulates high reward throughout the process. Expand
Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations
TLDR
This paper uses Bayesian networks with structured conditional probability matrices to represent PomDPs, and uses this model to structure the belief space for POMDP algorithms, allowing irrelevant distinctions to be ignored. Expand
Learning Policies for Partially Observable Environments: Scaling Up
TLDR
This paper discusses several simple solution methods and shows that all are capable of finding near- optimal policies for a selection of extremely small POMDP'S taken from the learning literature, but shows that none are able to solve a slightly larger and noisier problem based on robot navigation. Expand
Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems
TLDR
This work proposes and analyze a new learning algorithm to solve a certain class of non-Markov decision problems and operates in the space of stochastic policies, a space which can yield a policy that performs considerably better than any deterministic policy. Expand
Value-Function Approximations for Partially Observable Markov Decision Processes
  • M. Hauskrecht
  • Computer Science, Mathematics
  • J. Artif. Intell. Res.
  • 2000
TLDR
This work surveys various approximation methods, analyzes their properties and relations and provides some new insights into their differences, and presents a number of new approximation methods and novel refinements of existing techniques. Expand
Bayes-Adaptive POMDPs
TLDR
This work introduces a new mathematical model, the Bayes-Adaptive POMDP, which can be finitely approximated while preserving the value function and describes approximations for belief tracking and planning in this model. Expand
Planning and Acting in Partially Observable Stochastic Domains
TLDR
A novel algorithm for solving pomdps off line and how, in some cases, a finite-memory controller can be extracted from the solution to a POMDP is outlined. Expand
...
1
2
3
4
5
...