• Corpus ID: 14273320

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

@inproceedings{Deisenroth2011PILCOAM,
  title={PILCO: A Model-Based and Data-Efficient Approach to Policy Search},
  author={Marc Peter Deisenroth and Carl Edward Rasmussen},
  booktitle={International Conference on Machine Learning},
  year={2011}
}
In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method. [] Key Method Policy evaluation is performed in closed form using state-of-the-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks.

Figures and Tables from this paper

Gradient-Aware Model-based Policy Search

A novel model-based policy search approach that exploits the knowledge of the current agent policy to learn an approximate transition model, focusing on the portions of the environment that are most relevant for policy improvement.

Model-Based Reinforcement Learning via Proximal Policy Optimization

A data-efficient model-based approach called PIPPO (probabilistic inference via PPO), which makes online probabilistic dynamic model inference based on Gaussian process regression and executes offline policy improvement using PPO on the inferred model.

Data-Efficient Reinforcement Learning in Continuous-State POMDPs

This method extends the highly data-efficient PILCO algorithm into partially observed Markov decision processes (POMDPs) by considering the filtering process during policy evaluation, achieving significantly higher performance than combining a filter with a policy optimised by the original (unfiltered) framework.

Learning Robust Controllers Via Probabilistic Model-Based Policy Search

It is shown that enforcing a lower bound to the likelihood noise in the Gaussian Process dynamics model regularizes the policy updates and yields more robust controllers.

Learning Off-Policy with Online Planning

This work proposes Learning Off-Policy with Online Planning (LOOP), combining the techniques from model-based and model-free reinforcement learning algorithms, and introduces "actor-guided" trajectory optimization to mitigate the actor-divergence issue in the proposed method.

SAMBA: safe model-based & active reinforcement learning

In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO

Data-Efficient Reinforcement Learning in Continuous State-Action Gaussian-POMDPs

This work extends PILCO with filtering to instead plan in belief space, consistent with partially observable Markov decisions process (POMDP) planning, which enables data-efficient learning under significant observation noise, outperforming more naive methods.

Regularizing Model-Based Planning with Energy-Based Models

It is shown that off-policy training of an energy estimator can be effectively used to regularize planning with pre-trained dynamics models and enables sample-efficient learning to achieve competitive performance in challenging continuous control tasks such as Half-cheetah and Ant.

Minimax Model Learning

A novel off-policy loss function for learning a transition model in model-based reinforcement learning that allows for greater robustness under model misspecification or distribution shift induced by learning/evaluating policies that are distinct from the data-generating policy.

Exploring Model-based Planning with Policy Networks

This paper proposes a novel MBRL algorithm, model-based policy planning (POPLIN), that combines policy networks with online planning and shows that POPLIN obtains state-of-the-art performance in the MuJoCo benchmarking environments, being about 3x more sample efficient than the state of theart algorithms, such as PETS, TD3 and SAC.
...

References

SHOWING 1-10 OF 39 REFERENCES

Using inaccurate models in reinforcement learning

This paper presents a hybrid algorithm that requires only an approximate model, and only a small number of real-life trials, and achieves near-optimal performance in the real system, even when the model is only approximate.

Probabilistic Inference for Fast Learning in Control

This work provides a novel framework for very fast model-based reinforcement learning in continuous state and action spaces and uses flexible, non-parametric models to describe the world based on previously collected experience.

Model-free off-policy reinforcement learning in continuous environment

  • P. WawrzynskiA. Pacut
  • Computer Science
    2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541)
  • 2004
An algorithm of reinforcement learning in continuous state and action spaces that utilizes the entire history of agent-environment interaction to construct a control policy that is several times shorter than the one required by other algorithms.

Incorporating Domain Models into Bayesian Optimization for RL

This paper uses the observed data to learn approximate transitions models that allow for Monte-Carlo predictions of policy returns that are incorporated into the Bayesian Optimization framework as a type of prior on policy returns, which can better inform the BO process.

Efficient reinforcement learning using Gaussian processes

First, PILCO, a fully Bayesian approach for efficient RL in continuous-valued state and action spaces when no expert knowledge is available is introduced, and principled algorithms for robust filtering and smoothing in GP dynamic systems are proposed.

Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning

This paper addresses the case where the system must be prevented from having catastrophic failures during learning, and proposes a new algorithm adapted from the dual control literature and use Bayesian locally weighted regression models with dynamic programming.

Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning

It is demonstrated how a low-cost off-the-shelf robotic system can learn closed-loop policies for a stacking task in only a handful of trials-from scratch.

Reinforcement Learning in Continuous Time and Space

  • K. Doya
  • Computer Science
    Neural Computation
  • 2000
This article presents a reinforcement learning framework for continuous-time dynamical systems without a priori discretization of time, state, and action. Basedonthe Hamilton-Jacobi-Bellman (HJB)

Policy Gradient Methods for Robotics

  • Jan PetersS. Schaal
  • Computer Science
    2006 IEEE/RSJ International Conference on Intelligent Robots and Systems
  • 2006
An overview on learning with policy gradient methods for robotics with a strong focus on recent advances in the field is given and how the most recently developed methods can significantly improve learning performance is shown.

Gaussian process dynamic programming