Efficient Exploration With Latent Structure

  title={Efficient Exploration With Latent Structure},
  author={Bethany R. Leffler and Michael L. Littman and Alexander L. Strehl and Thomas J. Walsh},
  booktitle={Robotics: Science and Systems},
When interacting with a new environment, a robot can improve its online performance by efficiently exploring the effects of its actions. The efficiency of exploration can be expanded significantly by modeling and using latent structure to generalize experiences. We provide a theoretical development of the problem of exploration with latent structure, analyze several algorithms and prove matching lower bounds. We demonstrate our algorithmic ideas on a simple robot car repeatedly traversing a… 

Figures from this paper

Efficient Reinforcement Learning with Relocatable Action Models

This paper explores an environment-modeling framework that represents transitions as state-independent outcomes that are common to all states that share the same type and provides an efficient algorithm and experimental results in both simulated and robotic environments.

Relocatable Action Models for Autonomous Navigation

A reinforcement-learning agent, in general, uses information from the environment to determine the value of its actions. Once the agent begins acting in the world, there is no further modification of

Efficient Learning of Dynamics Models using Terrain Classification

This work demonstrates a system that reliably learns an optimal control policy using this additional terrain information and contrast it with several systems based on more traditional methods that fail to reliably complete the same task.

Efficient learning of relational models for sequential decision making

This work presents theoretical and empirical results on learning relational models of web-service descriptions using a dataflow model called a Task Graph to capture the important connections between inputs and outputs of services in a workflow, and shows that compact relational models can be efficiently learned from limited amounts of basic data.

Terrain Classification for Learning Accurate Dynamics Models Blind Submission

  • Computer Science
  • 2007
This work focuses on using vision to determine which portions of the terrain will lead the robot to having different dynamics, allowing for multiple dynamics models to be learned, thereby making the agent's model of the world more accurate.

Transferable Models for Autonomous Learning

This approach addresses concerns in epigenetic robotics by enabling robots to learn from their experience and adapt their behavior to the environment in which they find themselves instead of requiring hand-tuning by a human designer.

TempLe: Learning Template of Transitions for Sample Efficient Multi-task RL

Temporary Learning (TempLe) is proposed, a PAC-MDP method for multi-task reinforcement learning that could be applied to tasks with varying state/action space without prior knowledge of inter-task mappings and achieves much lower sample complexity than single-task learners or state-of-the-art multi- task methods.

A unifying framework for computational reinforcement learning theory

This thesis is that the KWIK learning model provides a flexible, modularized, and unifying way for creating and analyzing reinforcement-learning algorithms with provably efficient exploration and facilitates the development of new algorithms with smaller sample complexity, which have demonstrated empirically faster learning speed in real-world problems.

Compact parametric models for efficient sequential decision making in high-dimensional, uncertain domains

A reinforcement learning (RL) algorithm where the use of a parametric model allows the algorithm to make close to optimal decisions on all but a number of samples that scales polynomially with the dimension, a significant improvement over most prior RL provably approximately optimal algorithms.

Autonomous robot work cell exploration using multisensory eye-in-hand systems

A sensor-based approach to the selfguided robotic exploration of initially partly unknown environments, which takes sensing uncertainty into account, and enables information gain-driven missions such as view planning for object recognition or grasp planning.



The role of exploration in learning control

Whenever an intelligent agent learns to control an unknown environment, two opposing objectives have to be combined: the environment must be explored in order to identify a (sub-) optimal controller and experience made during learning must also be considered for action selection.

Model based Bayesian Exploration

This paper explicitly represents uncertainty about the parameters of the model and build probability distributions over Q-values based on these that are used to compute a myopic approximation to the value of information for each action and hence to select the action that best balances exploration and exploitation.

An empirical evaluation of interval estimation for Markov decision processes

  • A. StrehlM. Littman
  • Computer Science
    16th IEEE International Conference on Tools with Artificial Intelligence
  • 2004
This work takes an empirical approach to evaluating three model-based reinforcement-learning methods, and indicates that effective exploration can result in dramatic improvements in the observed rate of learning.

Improving Action Selection in MDP's via Knowledge Transfer

The empirical results show the potential of RTP action transfer to substantially expand the applicability of RL to problems with large action sets and contributes randomized task perturbation (RTP), an enhancement to action transfer that makes it robust to unrepresentative source tasks.

Efficient Reinforcement Learning in Factored MDPs

We present a provably efficient and near-optimal algorithm for reinforcement learning in Markov decision processes (MDPs) whose transition model can be factored as a dynamic Bayesian network (DBN).

Action Elimination and Stopping Conditions for Reinforcement Learning

A model-based and a model-free variants of the elimination method that derive stopping conditions that guarantee that the learned policy is approximately optimal with high probability and demonstrates a considerable speedup and added robustness.

Recent Advances in Hierarchical Reinforcement Learning

This work reviews several approaches to temporal abstraction and hierarchical organization that machine learning researchers have recently developed and discusses extensions of these ideas to concurrent activities, multiagent coordination, and hierarchical memory for addressing partial observability.

Efficient reinforcement learning

A new formal model for studying reinforcement learning, based on Valiant's PAC framework, that requires the learner to produce a policy whose expected value from the initial state is ε-close to that of the optimal policy, with probability no less than 1−δ.

R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning

R-MAX is a very simple model-based reinforcement learning algorithm which can attain near-optimal average reward in polynomial time and formally justifies the ``optimism under uncertainty'' bias used in many RL algorithms.

A Quantitative Study of Hypothesis Selection