Learn More
Markov decision processes (MDPs) have proven to be popular models for decision-theoretic planning, but standard dynamic programming algorithms for solving MDPs rely on explicit, state-based specifications and computations. To alleviate the combinatorial problems associated with such methods, we propose new representational and computational techniques for(More)
Markov decision processes (MDPs) have recently been applied to the problem of modeling decision-theoretic planning. While traditional methods for solving MDPs are often practical for small states spaces, their effectiveness for large AI planning problems is questionable. We present an algorithm, called structured policy iteration (SPI), that constructs(More)
A central problem in learning in complex environments is balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the classical notion of Value of Infor-mation—the expected improvement in future decision quality that might arise from the information acquired by(More)
There has been considerable work in AI on decision-theoretic planning and planning under uncertainty. Unfortunately, all of this work suffers from one or more of the following limitations: 1) it relies on very simple models of actions and time, 2) it assumes that uncertainty is manifested in discrete action outcomes, and 3) it is only practical for very(More)
Markov decision processes (MDPs) have recently been proposed as useful conceptual models for understanding decision-theoretic planning. However, the utility of the associated computational methods remains open to question: most algorithms for computing optimal policies require explicit enumeration of the state space of the planning problem. We propose an(More)
Exploration for robotic mapping is typically handled using greedy entropy reduction. Here we show how to apply information lookahead planning to a challenging instance of this problem in which an Autonomous Underwater Vehicle (AUV) maps hy-drothermal vents. Given a simulation of vent behaviour we derive an observation function to turn the planning for(More)
We describe an approach for exploiting structure in Markov Decision Processes with continuous state variables. At each step of the dynamic programming , the state space is dynamically partitioned into regions where the value function is the same throughout the region. We first describe the algorithm for piecewise constant representations. We then extend it(More)