• Corpus ID: 14273320

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

@inproceedings{Deisenroth2011PILCOAM,
  title={PILCO: A Model-Based and Data-Efficient Approach to Policy Search},
  author={Marc Peter Deisenroth and Carl Edward Rasmussen},
  booktitle={ICML},
  year={2011}
}
In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method. [] Key Method Policy evaluation is performed in closed form using state-of-the-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks.

Figures and Tables from this paper

Gradient-Aware Model-based Policy Search

TLDR
A novel model-based policy search approach that exploits the knowledge of the current agent policy to learn an approximate transition model, focusing on the portions of the environment that are most relevant for policy improvement.

Model-Based Reinforcement Learning via Proximal Policy Optimization

TLDR
A data-efficient model-based approach called PIPPO (probabilistic inference via PPO), which makes online probabilistic dynamic model inference based on Gaussian process regression and executes offline policy improvement using PPO on the inferred model.

Data-Efficient Reinforcement Learning in Continuous-State POMDPs

TLDR
This method extends the highly data-efficient PILCO algorithm into partially observed Markov decision processes (POMDPs) by considering the filtering process during policy evaluation, achieving significantly higher performance than combining a filter with a policy optimised by the original (unfiltered) framework.

Learning Robust Controllers Via Probabilistic Model-Based Policy Search

TLDR
It is shown that enforcing a lower bound to the likelihood noise in the Gaussian Process dynamics model regularizes the policy updates and yields more robust controllers.

Uncertainty-aware Model-based Policy Optimization

TLDR
This paper proposes a novel uncertainty-aware model-based policy optimization framework which shows promising results on challenging continuous control benchmarks with competitive asymptotic performance and significantly lower sample complexity than state-of-the-art baselines.

Learning Off-Policy with Online Planning

TLDR
This work proposes Learning Off-Policy with Online Planning (LOOP), combining the techniques from model-based and model-free reinforcement learning algorithms, and introduces "actor-guided" trajectory optimization to mitigate the actor-divergence issue in the proposed method.

SAMBA: safe model-based & active reinforcement learning

In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO

Data-Efficient Reinforcement Learning in Continuous State-Action Gaussian-POMDPs

TLDR
This work extends PILCO with filtering to instead plan in belief space, consistent with partially observable Markov decisions process (POMDP) planning, which enables data-efficient learning under significant observation noise, outperforming more naive methods.

Regularizing Model-Based Planning with Energy-Based Models

TLDR
It is shown that off-policy training of an energy estimator can be effectively used to regularize planning with pre-trained dynamics models and enables sample-efficient learning to achieve competitive performance in challenging continuous control tasks such as Half-cheetah and Ant.

Hierarchical model-based policy optimization: from actions to action sequences and back

TLDR
This work develops a normative framework for hierarchical model-based policy optimization based on applying second-order methods in the space of all possible state-action paths and demonstrates that the natural path gradient can be computed exactly given an environment dynamics model and depends on expressions akin to higher-order successor representations.
...

References

SHOWING 1-10 OF 39 REFERENCES

Using inaccurate models in reinforcement learning

TLDR
This paper presents a hybrid algorithm that requires only an approximate model, and only a small number of real-life trials, and achieves near-optimal performance in the real system, even when the model is only approximate.

Probabilistic Inference for Fast Learning in Control

TLDR
This work provides a novel framework for very fast model-based reinforcement learning in continuous state and action spaces and uses flexible, non-parametric models to describe the world based on previously collected experience.

Model-free off-policy reinforcement learning in continuous environment

  • P. WawrzynskiA. Pacut
  • Computer Science
    2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541)
  • 2004
TLDR
An algorithm of reinforcement learning in continuous state and action spaces that utilizes the entire history of agent-environment interaction to construct a control policy that is several times shorter than the one required by other algorithms.

Incorporating Domain Models into Bayesian Optimization for RL

TLDR
This paper uses the observed data to learn approximate transitions models that allow for Monte-Carlo predictions of policy returns that are incorporated into the Bayesian Optimization framework as a type of prior on policy returns, which can better inform the BO process.

Efficient reinforcement learning using Gaussian processes

TLDR
First, PILCO, a fully Bayesian approach for efficient RL in continuous-valued state and action spaces when no expert knowledge is available is introduced, and principled algorithms for robust filtering and smoothing in GP dynamic systems are proposed.

Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning

TLDR
This paper addresses the case where the system must be prevented from having catastrophic failures during learning, and proposes a new algorithm adapted from the dual control literature and use Bayesian locally weighted regression models with dynamic programming.

Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning

TLDR
It is demonstrated how a low-cost off-the-shelf robotic system can learn closed-loop policies for a stacking task in only a handful of trials-from scratch.

Reinforcement Learning in Continuous Time and Space

  • K. Doya
  • Computer Science
    Neural Computation
  • 2000
This article presents a reinforcement learning framework for continuous-time dynamical systems without a priori discretization of time, state, and action. Basedonthe Hamilton-Jacobi-Bellman (HJB)

Policy Gradient Methods for Robotics

  • Jan PetersS. Schaal
  • Computer Science
    2006 IEEE/RSJ International Conference on Intelligent Robots and Systems
  • 2006
TLDR
An overview on learning with policy gradient methods for robotics with a strong focus on recent advances in the field is given and how the most recently developed methods can significantly improve learning performance is shown.

Gaussian process dynamic programming