# Relative Entropy Policy Search

@inproceedings{Peters2010RelativeEP, title={Relative Entropy Policy Search}, author={Jan Peters and Katharina Muelling and Yasemin Altun}, booktitle={AAAI}, year={2010} }

Policy search is a successful approach to reinforcement learning. However, policy improvements often result in the loss of information. Hence, it has been marred by premature convergence and implausible solutions. As first suggested in the context of covariant policy gradients (Bagnell and Schneider 2003), many of these problems may be addressed by constraining the information loss. In this paper, we continue this path of reasoning and suggest the Relative Entropy Policy Search (REPS) method… Expand

#### Supplemental Presentations

#### 508 Citations

Search in Continuous Action Domains : an Overview

- 2018

Continuous action policy search is currently the focus of intensive research, driven both by the recent success of deep reinforcement learning algorithms and the emergence of competitors based on… Expand

Policy Search in Continuous Action Domains: an Overview

- Computer Science, Medicine
- Neural Networks
- 2019

A broad survey of policy search methods is presented, providing a unified perspective on very different approaches, including also Bayesian Optimization and directed exploration methods. Expand

Variational Bayesian Parameter-Based Policy Exploration

- Computer Science
- 2020 International Joint Conference on Neural Networks (IJCNN)
- 2020

An objective function is defined that explicitly accounts for reward uncertainty and an algorithm is provided that uses a Bayesian method to optimize this function under the uncertainty of policy parameters in continuous state and action spaces for parameter-based policy exploration. Expand

Weighted Likelihood Policy Search with Model Selection

- Computer Science
- NIPS
- 2012

A novel DPS method, weighted likelihood policy search (WLPS), where a policy is efficiently learned through the weighted likelihood estimation, and a new measurement for model comparison in DPS based on the weighted log-likelihood is developed. Expand

Projections for Approximate Policy Iteration Algorithms

- Computer Science
- ICML
- 2019

This paper proposes to improve over existing approximate policy iteration algorithms by introducing a set of projections that transform the constrained problem into an unconstrained one which is then solved by standard gradient descent. Expand

Generalized exploration in policy search

- Computer Science
- Machine Learning
- 2017

This paper introduces a unifying view on step-based and episode-based exploration that allows for such balanced trade-offs and evaluates the exploration strategy on four dynamical systems and shows that a more balancedtrade-off can yield faster learning and better final policies. Expand

Maximum a Posteriori Policy Optimisation

- Computer Science, Mathematics
- ICLR
- 2018

This work introduces a new algorithm for reinforcement learning called Maximum aposteriori Policy Optimisation (MPO) based on coordinate ascent on a relative entropy objective and develops two off-policy algorithms that are competitive with the state-of-the-art in deep reinforcement learning. Expand

Relative Entropy Regularized Policy Iteration

- Mathematics, Computer Science
- ArXiv
- 2018

An off-policy actor-critic algorithm for Reinforcement Learning (RL) that combines ideas from gradient-free optimization via stochastic search with learned action-value function and can be seen either as an extension of the Maximum a Posteriori Policy Optimisation algorithm (MPO) or as an addition to a policy iteration scheme. Expand

Policy Evaluation Networks

- Computer Science, Mathematics
- ArXiv
- 2020

The empirical results demonstrate that combining these three elements (learned Policy Evaluation Network, policy fingerprints, gradient ascent) can produce policies that outperform those that generated the training data, in zero-shot manner. Expand

Relative Entropy Inverse Reinforcement Learning

- Computer Science
- AISTATS
- 2011

This paper proposes a model-free IRL algorithm, where the relative entropy between the empirical distribution of the state-action trajectories under a baseline policy and their distribution under the learned policy is minimized by stochastic gradient descent. Expand

#### References

SHOWING 1-10 OF 19 REFERENCES

Covariant Policy Search

- Mathematics, Computer Science
- IJCAI
- 2003

This work proposes a natural metric on controller parameterization that results from considering the manifold of probability distributions over paths induced by a stochastic controller that leads to a covariant gradient ascent rule. Expand

Learning from Scarce Experience

- Computer Science
- ICML
- 2002

A family of algorithms based on likelihood ratio estimation that use data gathered when executing one policy (or collection of policies) to estimate the value of a different policy and show positive empirical results and provide the sample complexity bound. Expand

Policy Gradient Methods for Reinforcement Learning with Function Approximation

- Mathematics, Computer Science
- NIPS
- 1999

This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy. Expand

A Natural Policy Gradient

- Computer Science
- NIPS
- 2001

This work provides a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space and shows drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris. Expand

Variational methods for Reinforcement Learning

- Mathematics, Computer Science
- AISTATS
- 2010

This work considers a Bayesian alternative that maintains a distribution over the transition so that the resulting policy takes into account the limited experience of the environment and discusses two approximate solution methods, Variational Bayes and Expectation Propagation. Expand

Natural Actor-Critic

- Sociology, Computer Science
- ECML
- 2005

This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing Amari's natural gradient… Expand

Efficient reinforcement learning using Gaussian processes

- Computer Science
- 2010

First, PILCO, a fully Bayesian approach for efficient RL in continuous-valued state and action spaces when no expert knowledge is available is introduced, and principled algorithms for robust filtering and smoothing in GP dynamic systems are proposed. Expand

Introduction to Reinforcement Learning

- Computer Science
- 1998

In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Expand

Policy Search for Motor Primitives in Robotics

- Computer Science
- NIPS 2008
- 2008

This paper extends previous work on policy learning from the immediate reward case to episodic reinforcement learning, resulting in a general, common framework also connected to policy gradient methods and yielding a novel algorithm for policy learning that is particularly well-suited for dynamic motor primitives. Expand

The Linear Programming Approach to Approximate Dynamic Programming

- Mathematics, Computer Science
- Oper. Res.
- 2003

An efficient method based on linear programming for approximating solutions to large-scale stochastic control problems by "fits" a linear combination of pre-selected basis functions to the dynamic programming cost-to-go function. Expand