# Efficient reinforcement learning using Gaussian processes

@inproceedings{Deisenroth2010EfficientRL, title={Efficient reinforcement learning using Gaussian processes}, author={Marc Peter Deisenroth}, year={2010} }

This book examines Gaussian processes in both model-based reinforcement learning (RL) and inference in nonlinear dynamic systems. [... ] Key Method PILCO takes model uncertainties consistently into account during long-term planning to reduce model bias.
Second, we propose principled algorithms for robust filtering and smoothing in GP dynamic systems. Expand

## Figures and Tables from this paper

figure 1.1 figure 1.2 figure 2.1 table 2.1 figure 2.2 figure 2.3 figure 2.4 figure 2.5 figure 2.6 figure 3.1 table 3.1 figure 3.10 table 3.10 figure 3.11 figure 3.12 table 3.12 figure 3.13 figure 3.14 figure 3.15 figure 3.16 figure 3.17 figure 3.18 figure 3.19 figure 3.2 table 3.2 figure 3.20 figure 3.21 figure 3.22 figure 3.23 figure 3.24 figure 3.25 figure 3.26 figure 3.27 figure 3.28 figure 3.29 figure 3.3 table 3.3 figure 3.30 figure 3.31 figure 3.32 figure 3.33 figure 3.34 figure 3.35 figure 3.36 figure 3.37 figure 3.38 figure 3.39 figure 3.4 figure 3.40 figure 3.41 figure 3.42 figure 3.5 table 3.5 figure 3.6 table 3.6 figure 3.7 table 3.7 figure 3.8 table 3.8 figure 3.9 table 3.9 figure 4.1 table 4.1 figure 4.10 figure 4.2 table 4.2 figure 4.3 table 4.3 figure 4.4 table 4.4 figure 4.5 table 4.5 figure 4.6 figure 4.7 figure 4.8 figure 4.9 figure C.1 figure C.2 figure C.3 figure C.4 table D.1 table D.2 table D.3 table D.4

## 187 Citations

Off-policy reinforcement learning with Gaussian processes

- Computer ScienceIEEE/CAA Journal of Automatica Sinica
- 2014

An off-policy Bayesian nonparameteric approximate reinforcement learning framework that employs a Gaussian processes model of the value function that has competitive learning speed in addition to its convergence guarantees and its ability to automatically choose its own bases locations.

Uncertainty Estimation in Continuous Models applied to Reinforcement Learning

- Computer Science
- 2019

This work considers the model-based reinforcement learning framework where it is interested in learning a model and control policy for a given objective and considers modeling the dynamics of an environment using Gaussian Processes or a Bayesian neural network.

Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control

- Computer ScienceAISTATS
- 2018

This work proposes a model-based RL framework based on probabilistic Model Predictive Control based on Gaussian Processes to incorporate model uncertainty into long-term predictions, thereby, reducing the impact of model errors and provides theoretical guarantees for first-order optimality in the GP-based transition models with deterministic approximate inference for long- term planning.

Online reinforcement learning by Bayesian inference

- Computer Science2015 International Joint Conference on Neural Networks (IJCNN)
- 2015

This paper proposes an online reinforcement learning algorithm referred as to Bayesian-SARSA that can sequentially update according to observed variables such as state and reward by Bayesian inference during the policy evaluation.

Off-policy reinforcement learning with Gaussian processes Citation

- Computer Science
- 2014

An off-policy Bayesian nonparameteric approximate reinforcement learning framework that employs a Gaussian Processes model of the value function that has competitive learning speeds in addition to its convergence guarantees and its ability to automatically choose its own bases locations.

Model-Based Bayesian Sparse Sampling for Data Efficient Control

- Computer Science
- 2019

A novel Bayesian-inspired model-based policy search algorithm for data efficient control that makes use of approximate Gaussian processes in the form of random Fourier features for fast online systems identification and computationally efficient posterior updates via rank one Cholesky updates.

Multi-Fidelity Model-Free Reinforcement Learning with Gaussian Processes

- Computer Science
- 2018

A Multi-Fidelity Reinforcement Learning (MFRL) model-free algorithm that leverages Gaussian Processes (GPs) to learn the optimal policy in the real world to reduce the number of learning samples as the agent moves up the simulator chain.

Multi-Fidelity Reinforcement Learning with Gaussian Processes

- Computer ScienceArXiv
- 2017

Two versions of Multi-Fidelity Reinforcement Learning (MFRL) are presented, model-based and model-free, that leverage Gaussian Processes (GPs) to learn the optimal policy in a real-world environment.

Gaussian Processes for Data-Efficient Learning in Robotics and Control

- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2015

This paper learns a probabilistic, non-parametric Gaussian process transition model of the system and applies it to autonomous learning in real robot and control tasks, achieving an unprecedented speed of learning.

Online Constrained Model-based Reinforcement Learning

- Computer ScienceUAI
- 2017

This work proposes a model based approach that combines Gaussian Process regression and Receding Horizon Control that results in an agent that can learn and plan in real-time under non-linear constraints and demonstrates the benefits of online learning on an autonomous racing task.

## References

SHOWING 1-10 OF 267 REFERENCES

Gaussian Processes in Reinforcement Learning

- Computer ScienceNIPS
- 2003

It is speculated that the intrinsic ability of GP models to characterise distributions of functions would allow the method to capture entire distributions over future values instead of merely their expectation, which has traditionally been the focus of much of reinforcement learning.

A Bayesian Framework for Reinforcement Learning

- Computer ScienceICML
- 2000

It is proposed that the learning process estimates online the full posterior distribution over models and to determine behavior, a hypothesis is sampled from this distribution and the greedy policy with respect to the hypothesis is obtained by dynamic programming.

Reinforcement learning with Gaussian processes

- Computer ScienceICML
- 2005

A SARSA based extension of GPTD is presented, termed GPSARSA, that allows the selection of actions and the gradual improvement of policies without requiring a world-model.

State-Space Inference and Learning with Gaussian Processes

- Computer ScienceAISTATS
- 2010

A new, general methodology for inference and learning in nonlinear state-space models that are described probabilistically by non-parametric GP models is proposed and the expectation maximization algorithm is applied.

A Bayesian Sampling Approach to Exploration in Reinforcement Learning

- Computer ScienceUAI
- 2009

This work presents a modular approach to reinforcement learning that uses a Bayesian representation of the uncertainty over models and achieves near-optimal reward with high probability with a sample complexity that is low relative to the speed at which the posterior distribution converges during learning.

Model-free off-policy reinforcement learning in continuous environment

- Computer Science2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541)
- 2004

An algorithm of reinforcement learning in continuous state and action spaces that utilizes the entire history of agent-environment interaction to construct a control policy that is several times shorter than the one required by other algorithms.

An analytic solution to discrete Bayesian reinforcement learning

- Computer ScienceICML
- 2006

This work proposes a new algorithm, called BEETLE, for effective online learning that is computationally efficient while minimizing the amount of exploration, and takes a Bayesian model-based approach, framing RL as a partially observable Markov decision process.

Learning nonlinear state-space models for control

- Computer ScienceProceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005.
- 2005

Simulations with a cart-pole swing-up task confirm that the latent state space provides a representation that is easier to predict and control than the original observation space.