# Partially Observable Markov Decision Processes for Artificial Intelligence

@inproceedings{Kaelbling1995PartiallyOM, title={Partially Observable Markov Decision Processes for Artificial Intelligence}, author={Leslie Pack Kaelbling and Michael L. Littman and Anthony R. Cassandra}, booktitle={Reasoning with Uncertainty in Robotics}, year={1995} }

In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. In many cases, we have developed new ways of viewing the problem that are, perhaps, more consistent with the AI perspective. We begin by introducing the theory of Markov decision processes (Mdps) and partially observable Markov decision processes Pomdps. We then outline a novel algorithm for solving Pomdps off line and show how, in many cases…

## 36 Citations

### Application of POMDPs to Cognitive Radar

- Computer Science2019 53rd Asilomar Conference on Signals, Systems, and Computers
- 2019

Partially observable Markov decision processes (POMDPs) are evaluated herein as a framework for decision-making in radar scenarios, and value iteration is examined as a method for computing an optimal decision policy with a POMDP.

### Markov decision processes with noise-corrupted and delayed state observations

- MathematicsJ. Oper. Res. Soc.
- 1999

It is shown that in the limit as k →∞ the problem is equivalent to the completely unobserved case, and a measure of the marginal value of receiving state observations delayed by (k - 1) stages rather than delayed by k stages.

### A Model of Planning, Action and Interpretation with Goal Reasoning

- Computer Science
- 2016

This work formalizes the notion that goal formulation and goal change are themselves major parts of the problem-solving process and includes in the model not just planning and plan execution but also interpretation of the environment as plans are executed and exogenous events occur.

### Decision-Theoretic Planning for Autonomous Robotic Surveillance

- Computer ScienceApplied Intelligence
- 2004

A decision-theoretic strategy for surveillance as a first step towards automating the planning of the movement of an autonomous surveillance robot and to compare this strategy with other proposed strategies is introduced.

### Finding minimal observation set for finite (belief) state set in non-deterministic planning

- Computer Science2008 International Conference on Machine Learning and Cybernetics
- 2008

This paper presents method to reduce observations for any state set or belief state set, wherever the (belief) state set can be limit to a finite one.

### Semi-Supervised Learning of Decision-Making Models for Human-Robot Collaboration

- Computer ScienceCoRL
- 2019

ADACORL is presented, a framework to specify decision-making models and generate robot behavior for interaction in sequential tasks with known task objectives and demonstrates that its specification approach, despite significantly fewer labels, generates models (and policies) that perform equally well or better than models learned with supervised data.

### Bayesian Robots Programming

- Computer Science
- 2000

This work proposes a new method to program robots based on Bayesian inference and learnin and presents instances of behavior combinations, sensor fusion, hierarchical behavior com position, situation recognition and temporal sequencing.

### Les cahiers du laboratoire Leibniz Bayesian Robots Programming

- Computer Science
- 2000

A new method to program robots based on Bayesian inference and learning is proposed, demonstrated through a succession of increasingly complex experiments, which comprises the steps in the incremental development of a complex robot program.

### Bayesian Robot Programming

- Computer Science

This work proposes a new method to program robots based on Bayesian inference and learning called BRP for Bayesian Robot Programming, which is demonstrated through a succession of increasingly complex experiments.

### Bayesian Robot Programming

- Computer ScienceAuton. Robots
- 2004

This work proposes a new method to program robots based on Bayesian inference and learning called BRP for Bayesian Robot Programming, which is demonstrated through a succession of increasingly complex experiments.

## References

SHOWING 1-10 OF 21 REFERENCES

### State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms

- Computer Science
- 1982

A wide range of models in such areas as quality control, machine maintenance, internal auditing, learning, and optimal stopping are discussed within the POMDP-framework.

### A survey of solution techniques for the partially observed Markov decision process

- MathematicsAnn. Oper. Res.
- 1991

Several computational procedures presented are convergence accelerating variants of, or approximations to, the Smallwood-Sondik algorithm, which generalizes the standard, completely observed Markov decision process, and new research directions involving heuristic search.

### A survey of algorithmic methods for partially observed Markov decision processes

- Computer Science
- 1991

Several approximation methodologies are reviewed that have the potential to generate computationally feasible, high precision solutions for solving discrete-time, finite POMDPs over both finite and infinite horizons.

### The Witness Algorithm: Solving Partially Observable Markov Decision Processes

- Computer Science
- 1994

It is argued that the witness algorithm is superior to existing algorithms for solving POMDP problems in an important complexity-theoretic sense.

### Exploiting Structure in Policy Construction

- Computer ScienceIJCAI
- 1995

This work presents an algorithm, called structured policy Iteration (SPI), that constructs optimal policies without explicit enumeration of the state space, and retains the fundamental computational steps of the commonly used modified policy iteration algorithm, but exploits the variable and prepositional independencies reflected in a temporal Bayesian network representation of MDPs.

### Using Abstractions for Decision-Theoretic Planning with Time Constraints

- Computer ScienceAAAI
- 1994

This work explores a method for generating abstractions that allow approximately optimal policies to be constructed; computational gains are achieved through reduction of the state space.

### The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs

- MathematicsOper. Res.
- 1978

The paper develops easily implemented approximations to stationary policies based on finitely transient policies and shows that the concave hull of an approximation can be included in the well-known Howard policy improvement algorithm with subsequent convergence.

### Algorithms for partially observable markov decision processes

- Computer Science
- 1989

The thesis develops methods to solve discrete-time finite-state partially observable Markov decision processes and proves that the policy improvement step in iterative discretization procedure can be replaced by the approximation version of linear support algorithm.

### The Optimal Control of Partially Observable Markov Processes over a Finite Horizon

- MathematicsOper. Res.
- 1973

If there are only a finite number of control intervals remaining, then the optimal payoff function is a piecewise-linear, convex function of the current state probabilities of the internal Markov process, and an algorithm for utilizing this property to calculate the optimal control policy and payoff function for any finite horizon is outlined.