# Integrating Sample-Based Planning and Model-Based Reinforcement Learning

@inproceedings{Walsh2010IntegratingSP, title={Integrating Sample-Based Planning and Model-Based Reinforcement Learning}, author={Thomas J. Walsh and Sergiu Goschin and Michael L. Littman}, booktitle={AAAI}, year={2010} }

Recent advancements in model-based reinforcement learning have shown that the dynamics of many structured domains (e.g. DBNs) can be learned with tractable sample complexity, despite their exponentially large state spaces. [... ] Key Method To do so, we define sufficient criteria for a sample-based planner to be used in such a learning system and analyze two popular sample-based approaches from the literature. Expand

## 111 Citations

Efficient planning in R-max

- Computer ScienceAAMAS
- 2011

Through the exploitation of the specific nature of the planning problem in the considered reinforcement learning algorithms, it is shown how these planning algorithms can be improved.

TEXPLORE: real-time sample-efficient reinforcement learning for robots

- Computer ScienceMachine Learning
- 2012

The use of robots in society could be expanded by using reinforcement learning (RL) to allow robots to learn and adapt to new situations online. RL is a paradigm for learning sequential decision…

Model-Based Reinforcement Learning in Continuous Environments Using Real-Time Constrained Optimization

- Computer ScienceAAAI
- 2015

The approach combines model-based reinforcement learning with recent advances in approximate optimal control results in a bounded-rationality agent that makes decisions in real-time by efficiently solving a sequence of constrained optimization problems on learned sparse Gaussian process models.

RTMBA: A Real-Time Model-Based Reinforcement Learning Architecture for robot control

- Computer Science2012 IEEE International Conference on Robotics and Automation
- 2012

This paper presents a novel parallel architecture for model-based RL that runs in real-time by taking advantage of sample-based approximate planning methods and parallelizing the acting, model learning, and planning processes in a novel way such that the acting process is sufficiently fast for typical robot control cycles.

Learning and Using Models

- Computer ScienceReinforcement Learning
- 2012

This chapter surveys some of the types of models used in model-based methods and ways of learning them, as well as methods for planning on these models, and examines the sample efficiency of a few methods, which are highly dependent on having intelligent exploration mechanisms.

Optimistic planning for Markov decision processes

- Computer ScienceAISTATS
- 2012

An algorithm related to AO* is considered that optimistically explores a tree representation of the space of closed-loop policies, and the near-optimality of the action it returns after n tree node expansions is analyzed.

Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming

- Computer ScienceECML/PKDD
- 2015

This planner constructs approximations to the optimal policy by importance sampling, while exploiting the knowledge of the MDP model, and has wide applicability on domains ranging from strictly discrete to strictly continuous to hybrid ones, and is argued to be competitive given its generality.

Efficient learning of relational models for sequential decision making

- Computer Science
- 2010

This work presents theoretical and empirical results on learning relational models of web-service descriptions using a dataflow model called a Task Graph to capture the important connections between inputs and outputs of services in a workflow, and shows that compact relational models can be efficiently learned from limited amounts of basic data.

A review of optimistic planning in Markov decision processes

- Computer Science
- 2013

The resulting optimistic planning framework integrates several types of optimism previously used in planning, optimization, and reinforcement learning, in order to obtain several intuitive algorithms with good performance guarantees.

OPTIMISTIC PLANNING IN MARKOV DECISION PROCESSES

- Computer Science
- 2011

The resulting optimistic planning framework integrates several types of optimism previously used in planning, optimization, and reinforcement learning, in order to obtain several intuitive algorithms with good performance guarantees.

## References

SHOWING 1-10 OF 20 REFERENCES

A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

- Computer ScienceMachine Learning
- 2004

This paper presents a new algorithm that, given only a generative model (a natural and common type of simulator) for an arbitrary MDP, performs on-line, near-optimal planning with a per-state running time that has no dependence on the number of states.

Sample-based learning and search with permanent and transient memories

- Computer ScienceICML '08
- 2008

A reinforcement learning architecture that encompasses both sample- based learning and sample-based search, and that generalises across states during both learning and search is presented, and Dyna-2 is applied to high performance Computer Go.

Learning the structure of Factored Markov Decision Processes in reinforcement learning problems

- Computer ScienceICML
- 2006

SPITI is described, an instantiation of SDYNA, that uses incremental decision tree induction to learn the structure of a problem combined with an incremental version of the Structured Value Iteration algorithm that can build a factored representation of a reinforcement learning problem and may improve the policy faster than tabular reinforcement learning algorithms.

Reinforcement Learning in Finite MDPs: PAC Analysis

- Computer ScienceJ. Mach. Learn. Res.
- 2009

The current state-of-the-art for near-optimal behavior in finite Markov Decision Processes with a polynomial number of samples is summarized by presenting bounds for the problem in a unified theoretical framework.

Stochastic dynamic programming with factored representations

- Computer ScienceArtif. Intell.
- 2000

A unifying framework for computational reinforcement learning theory

- Computer Science
- 2009

This thesis is that the KWIK learning model provides a flexible, modularized, and unifying way for creating and analyzing reinforcement-learning algorithms with provably efficient exploration and facilitates the development of new algorithms with smaller sample complexity, which have demonstrated empirically faster learning speed in real-world problems.

Exploring compact reinforcement-learning representations with linear regression

- Computer ScienceUAI
- 2009

It is shown that KWIK linear regression can be used to learn the reward function of a factored MDP and the probabilities of action outcomes in Stochastic STRIPS and Object Oriented MDPs, none of which have been proven to be efficiently learnable in the RL setting before.

Reinforcement Learning: An Introduction

- Computer ScienceIEEE Transactions on Neural Networks
- 2005

This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

UCT for Tactical Assault Planning in Real-Time Strategy Games

- Computer ScienceIJCAI
- 2009

This paper investigates the use of UCT, a recent Monte-Carlo planning algorithm for tactical assault planning in real-time strategy games, and presents an evaluation of the approach on a range of tactical assault problems with different objectives in the RTS game Wargus.

Bandit Based Monte-Carlo Planning

- Computer ScienceECML
- 2006

A new algorithm is introduced, UCT, that applies bandit ideas to guide Monte-Carlo planning and is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling.