• Corpus ID: 765421

Integrating Sample-Based Planning and Model-Based Reinforcement Learning

@inproceedings{Walsh2010IntegratingSP,
  title={Integrating Sample-Based Planning and Model-Based Reinforcement Learning},
  author={Thomas J. Walsh and Sergiu Goschin and Michael L. Littman},
  booktitle={AAAI},
  year={2010}
}
Recent advancements in model-based reinforcement learning have shown that the dynamics of many structured domains (e.g. DBNs) can be learned with tractable sample complexity, despite their exponentially large state spaces. [] Key Method To do so, we define sufficient criteria for a sample-based planner to be used in such a learning system and analyze two popular sample-based approaches from the literature.

Figures from this paper

Efficient planning in R-max
TLDR
Through the exploitation of the specific nature of the planning problem in the considered reinforcement learning algorithms, it is shown how these planning algorithms can be improved.
TEXPLORE: real-time sample-efficient reinforcement learning for robots
The use of robots in society could be expanded by using reinforcement learning (RL) to allow robots to learn and adapt to new situations online. RL is a paradigm for learning sequential decision
Model-Based Reinforcement Learning in Continuous Environments Using Real-Time Constrained Optimization
TLDR
The approach combines model-based reinforcement learning with recent advances in approximate optimal control results in a bounded-rationality agent that makes decisions in real-time by efficiently solving a sequence of constrained optimization problems on learned sparse Gaussian process models.
RTMBA: A Real-Time Model-Based Reinforcement Learning Architecture for robot control
TLDR
This paper presents a novel parallel architecture for model-based RL that runs in real-time by taking advantage of sample-based approximate planning methods and parallelizing the acting, model learning, and planning processes in a novel way such that the acting process is sufficiently fast for typical robot control cycles.
Learning and Using Models
TLDR
This chapter surveys some of the types of models used in model-based methods and ways of learning them, as well as methods for planning on these models, and examines the sample efficiency of a few methods, which are highly dependent on having intelligent exploration mechanisms.
Optimistic planning for Markov decision processes
TLDR
An algorithm related to AO* is considered that optimistically explores a tree representation of the space of closed-loop policies, and the near-optimality of the action it returns after n tree node expansions is analyzed.
Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming
TLDR
This planner constructs approximations to the optimal policy by importance sampling, while exploiting the knowledge of the MDP model, and has wide applicability on domains ranging from strictly discrete to strictly continuous to hybrid ones, and is argued to be competitive given its generality.
Efficient learning of relational models for sequential decision making
TLDR
This work presents theoretical and empirical results on learning relational models of web-service descriptions using a dataflow model called a Task Graph to capture the important connections between inputs and outputs of services in a workflow, and shows that compact relational models can be efficiently learned from limited amounts of basic data.
A review of optimistic planning in Markov decision processes
TLDR
The resulting optimistic planning framework integrates several types of optimism previously used in planning, optimization, and reinforcement learning, in order to obtain several intuitive algorithms with good performance guarantees.
OPTIMISTIC PLANNING IN MARKOV DECISION PROCESSES
TLDR
The resulting optimistic planning framework integrates several types of optimism previously used in planning, optimization, and reinforcement learning, in order to obtain several intuitive algorithms with good performance guarantees.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 20 REFERENCES
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes
TLDR
This paper presents a new algorithm that, given only a generative model (a natural and common type of simulator) for an arbitrary MDP, performs on-line, near-optimal planning with a per-state running time that has no dependence on the number of states.
Sample-based learning and search with permanent and transient memories
TLDR
A reinforcement learning architecture that encompasses both sample- based learning and sample-based search, and that generalises across states during both learning and search is presented, and Dyna-2 is applied to high performance Computer Go.
Learning the structure of Factored Markov Decision Processes in reinforcement learning problems
TLDR
SPITI is described, an instantiation of SDYNA, that uses incremental decision tree induction to learn the structure of a problem combined with an incremental version of the Structured Value Iteration algorithm that can build a factored representation of a reinforcement learning problem and may improve the policy faster than tabular reinforcement learning algorithms.
Reinforcement Learning in Finite MDPs: PAC Analysis
TLDR
The current state-of-the-art for near-optimal behavior in finite Markov Decision Processes with a polynomial number of samples is summarized by presenting bounds for the problem in a unified theoretical framework.
A unifying framework for computational reinforcement learning theory
TLDR
This thesis is that the KWIK learning model provides a flexible, modularized, and unifying way for creating and analyzing reinforcement-learning algorithms with provably efficient exploration and facilitates the development of new algorithms with smaller sample complexity, which have demonstrated empirically faster learning speed in real-world problems.
Exploring compact reinforcement-learning representations with linear regression
TLDR
It is shown that KWIK linear regression can be used to learn the reward function of a factored MDP and the probabilities of action outcomes in Stochastic STRIPS and Object Oriented MDPs, none of which have been proven to be efficiently learnable in the RL setting before.
Reinforcement Learning: An Introduction
TLDR
This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
UCT for Tactical Assault Planning in Real-Time Strategy Games
TLDR
This paper investigates the use of UCT, a recent Monte-Carlo planning algorithm for tactical assault planning in real-time strategy games, and presents an evaluation of the approach on a range of tactical assault problems with different objectives in the RTS game Wargus.
Bandit Based Monte-Carlo Planning
TLDR
A new algorithm is introduced, UCT, that applies bandit ideas to guide Monte-Carlo planning and is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling.
...
1
2
...