Leslie Pack Kaelbling

Learn More
In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm for solving pomdps o line and show how, in some cases, a(More)
This paper surveys the eld of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the eld and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error(More)
Partially observable Markov decision processes (pomdp's) model decision problems in which an agent tries to maximize its reward in the face of limited and/or noisy sensor feedback. While the study of pomdp's is motivated by a need to address realistic problems, existing techniques for nding optimal behavior do not appear to scale well and have been unable(More)
In this paper, we describe the partially observable Markov decision process (pomdp) approach to nding optimal or near-optimal control strategies for partially observable stochastic environments, given a complete model of the environment. The pomdp approach was originally developed in the operations research community and provides a formal basis for planning(More)
Discrete Bayesian models have been used to model uncertainty for mobile-robot navigation, but the question of how actions should be chosen remains largely unexplored. This paper presents the optimal solution to the problem, formulated as a partially observable Markov decision process. Since solving for the optimal control policy is intractable, in general,(More)
Lifted inference algorithms exploit repeated structure in probabilistic models to answer queries efficiently. Previous work such as de Salvo Braz et al.’s first-order variable elimination (FOVE) has focused on the sharing of potentials across interchangeable random variables. In this paper, we also exploit interchangeability within individual potentials by(More)
In this article, we work towards the goal of developing agents that can learn to act in complex worlds. We develop a probabilistic, relational planning rule representation that compactly models noisy, nondeterministic action effects, and show how such rules can be effectively learned. Through experiments in simple planning domains and a 3D simulated blocks(More)
Delayed re in fo rcement l ea rn ing is an a t t r ac t i ve f r a m e w o r k for t he unsuperv ised lea rn ing of ac t ion pol ic ies for au tonomous agents. Some ex i s t i ng delayed re in fo rcement l ea rn ing techniques have shown promise in s imp le doma ins . However , a number of hurd les mus t be passed before they are app l icab le to real is t(More)
Cooperative games are those in which both agents share the same payoff structure. Valuebased reinforcement-learning algorithms, such as variants of Q-learning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Policy search methods are a reasonable alternative to value-based(More)