Modelling Policies in MDPs in Reproducing Kernel Hilbert Space

  title={Modelling Policies in MDPs in Reproducing Kernel Hilbert Space},
  author={Guy Lever and Ronnie Stafford},
We consider modelling policies for MDPs in (vector-valued) reproducing kernel Hilbert function spaces (RKHS). This enables us to work “non-parametrically” in a rich function class, and provides the ability to learn complex policies. We present a framework for performing gradientbased policy optimization in the RKHS, deriving the functional gradient of the return for our policy, which has a simple form and can be estimated efficiently. The policy representation naturally focuses on the relevant… CONTINUE READING


Publications citing this paper.
Showing 1-10 of 11 extracted citations

Non-parametric Policy Search with Limited Information Loss

Journal of Machine Learning Research • 2017
View 4 Excerpts
Highly Influenced

Eager and Memory-Based Non-Parametric Stochastic Search Methods for Learning Control

2018 IEEE International Conference on Robotics and Automation (ICRA) • 2018
View 1 Excerpt

Kernel dynamic policy programming: Practical reinforcement learning for high-dimensional robots

2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids) • 2016
View 1 Excerpt


Publications referenced by this paper.

Similar Papers

Loading similar papers…