• Corpus ID: 54568857

Efficient Model-Free Reinforcement Learning Using Gaussian Process

@article{Fan2018EfficientMR,
  title={Efficient Model-Free Reinforcement Learning Using Gaussian Process},
  author={Ying Fan and Letian Chen and Yizhou Wang},
  journal={ArXiv},
  year={2018},
  volume={abs/1812.04359}
}
Efficient Reinforcement Learning usually takes advantage of demonstration or good exploration strategy. By applying posterior sampling in model-free RL under the hypothesis of GP, we propose Gaussian Process Posterior Sampling Reinforcement Learning(GPPSTD) algorithm in continuous state space, giving theoretical justifications and empirical results. We also provide theoretical and empirical results that various demonstration could lower expected uncertainty and benefit posterior sampling… 

Figures from this paper

Coordinated Control of UAVs for Human-Centered Active Sensing of Wildfires
TLDR
A dual-criterion objective function based on Kalman uncertainty residual propagation and weighted multi-agent consensus protocol is developed, which enables the UAVs to actively infer the wildfire dynamics and parameters, track and monitor the fire transition, and safely manage human firefighters on the ground using acquired information.
Coordinated Control of UAVs for Human-Centered Active Sensing of Wildfires
TLDR
A dual-criterion objective function based on Kalman uncertainty residual propagation and weighted multi-agent consensus protocol is developed, which enables the UAVs to actively infer the wildfire dynamics and parameters, track and monitor the fire transition, and safely manage human firefighters on the ground using acquired information.
Zeroth-Order Supervised Policy Improvement
TLDR
It is proved that with a good function structure, the zeroth-order optimization strategy combining both local and global samplings can find the global minima within a polynomial number of samples.

References

SHOWING 1-10 OF 37 REFERENCES
Off-policy reinforcement learning with Gaussian processes
TLDR
An off-policy Bayesian nonparameteric approximate reinforcement learning framework that employs a Gaussian processes model of the value function that has competitive learning speed in addition to its convergence guarantees and its ability to automatically choose its own bases locations.
(More) Efficient Reinforcement Learning via Posterior Sampling
TLDR
An O(τS/√AT) bound on expected regret is established, one of the first for an algorithm not based on optimism, and close to the state of the art for any reinforcement learning algorithm.
Sample Efficient Reinforcement Learning with Gaussian Processes
TLDR
It is shown that GPs are KWIK learnable, and it is proved for the first time that a model-based RL approach using GPs, GP-Rmax, is sample efficient (PAC-MDP), but it is also shown that previous approaches to model-free RL using Gps take an exponential number of steps to find an optimal policy, and are therefore not sample efficient.
Efficient Exploration Through Bayesian Deep Q-Networks
TLDR
Bayesian Deep Q-Network (BDQN), a practical Thompson sampling based Reinforcement Learning (RL) Algorithm, is proposed, which can be trained with fast closed-form updates and its samples can be drawn efficiently through the Gaussian distribution.
Gaussian processes for informative exploration in reinforcement learning
This paper presents the iGP-SARSA(λ) algorithm for temporal difference reinforcement learning (RL) with non-myopic information gain considerations. The proposed algorithm uses a Gaussian process (GP)
A Bayesian Framework for Reinforcement Learning
TLDR
It is proposed that the learning process estimates online the full posterior distribution over models and to determine behavior, a hypothesis is sampled from this distribution and the greedy policy with respect to the hypothesis is obtained by dynamic programming.
Improving Optimistic Exploration in Model-Free Reinforcement Learning
TLDR
Two extensions to standard optimistic exploration are proposed based on different initialisation of the value function of goal states which improve anytime performance and help on domains where learning takes place on the subspace of the large state space, that is, where the standard optimistic approach faces more difficulties.
Reinforcement learning with Gaussian processes
TLDR
A SARSA based extension of GPTD is presented, termed GPSARSA, that allows the selection of actions and the gradual improvement of policies without requiring a world-model.
Bayesian Inverse Reinforcement Learning
TLDR
This paper shows how to combine prior knowledge and evidence from the expert's actions to derive a probability distribution over the space of reward functions and presents efficient algorithms that find solutions for the reward learning and apprenticeship learning tasks that generalize well over these distributions.
Maximum Entropy Inverse Reinforcement Learning
TLDR
A probabilistic approach based on the principle of maximum entropy that provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods is developed.
...
1
2
3
4
...