Corpus ID: 2984847

Relative Entropy Policy Search

@inproceedings{Peters2010RelativeEP,
  title={Relative Entropy Policy Search},
  author={Jan Peters and Katharina Muelling and Yasemin Altun},
  booktitle={AAAI},
  year={2010}
}
Policy search is a successful approach to reinforcement learning. However, policy improvements often result in the loss of information. Hence, it has been marred by premature convergence and implausible solutions. As first suggested in the context of covariant policy gradients (Bagnell and Schneider 2003), many of these problems may be addressed by constraining the information loss. In this paper, we continue this path of reasoning and suggest the Relative Entropy Policy Search (REPS) method… Expand
Search in Continuous Action Domains : an Overview
Continuous action policy search is currently the focus of intensive research, driven both by the recent success of deep reinforcement learning algorithms and the emergence of competitors based onExpand
Policy Search in Continuous Action Domains: an Overview
TLDR
A broad survey of policy search methods is presented, providing a unified perspective on very different approaches, including also Bayesian Optimization and directed exploration methods. Expand
Variational Bayesian Parameter-Based Policy Exploration
  • T. Hosino
  • Computer Science
  • 2020 International Joint Conference on Neural Networks (IJCNN)
  • 2020
TLDR
An objective function is defined that explicitly accounts for reward uncertainty and an algorithm is provided that uses a Bayesian method to optimize this function under the uncertainty of policy parameters in continuous state and action spaces for parameter-based policy exploration. Expand
Weighted Likelihood Policy Search with Model Selection
TLDR
A novel DPS method, weighted likelihood policy search (WLPS), where a policy is efficiently learned through the weighted likelihood estimation, and a new measurement for model comparison in DPS based on the weighted log-likelihood is developed. Expand
Projections for Approximate Policy Iteration Algorithms
TLDR
This paper proposes to improve over existing approximate policy iteration algorithms by introducing a set of projections that transform the constrained problem into an unconstrained one which is then solved by standard gradient descent. Expand
Generalized exploration in policy search
TLDR
This paper introduces a unifying view on step-based and episode-based exploration that allows for such balanced trade-offs and evaluates the exploration strategy on four dynamical systems and shows that a more balancedtrade-off can yield faster learning and better final policies. Expand
Maximum a Posteriori Policy Optimisation
TLDR
This work introduces a new algorithm for reinforcement learning called Maximum aposteriori Policy Optimisation (MPO) based on coordinate ascent on a relative entropy objective and develops two off-policy algorithms that are competitive with the state-of-the-art in deep reinforcement learning. Expand
Relative Entropy Regularized Policy Iteration
TLDR
An off-policy actor-critic algorithm for Reinforcement Learning (RL) that combines ideas from gradient-free optimization via stochastic search with learned action-value function and can be seen either as an extension of the Maximum a Posteriori Policy Optimisation algorithm (MPO) or as an addition to a policy iteration scheme. Expand
Policy Evaluation Networks
TLDR
The empirical results demonstrate that combining these three elements (learned Policy Evaluation Network, policy fingerprints, gradient ascent) can produce policies that outperform those that generated the training data, in zero-shot manner. Expand
Relative Entropy Inverse Reinforcement Learning
TLDR
This paper proposes a model-free IRL algorithm, where the relative entropy between the empirical distribution of the state-action trajectories under a baseline policy and their distribution under the learned policy is minimized by stochastic gradient descent. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 19 REFERENCES
Covariant Policy Search
TLDR
This work proposes a natural metric on controller parameterization that results from considering the manifold of probability distributions over paths induced by a stochastic controller that leads to a covariant gradient ascent rule. Expand
Learning from Scarce Experience
TLDR
A family of algorithms based on likelihood ratio estimation that use data gathered when executing one policy (or collection of policies) to estimate the value of a different policy and show positive empirical results and provide the sample complexity bound. Expand
Policy Gradient Methods for Reinforcement Learning with Function Approximation
TLDR
This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy. Expand
A Natural Policy Gradient
TLDR
This work provides a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space and shows drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris. Expand
Variational methods for Reinforcement Learning
TLDR
This work considers a Bayesian alternative that maintains a distribution over the transition so that the resulting policy takes into account the limited experience of the environment and discusses two approximate solution methods, Variational Bayes and Expectation Propagation. Expand
Natural Actor-Critic
This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing Amari's natural gradientExpand
Efficient reinforcement learning using Gaussian processes
TLDR
First, PILCO, a fully Bayesian approach for efficient RL in continuous-valued state and action spaces when no expert knowledge is available is introduced, and principled algorithms for robust filtering and smoothing in GP dynamic systems are proposed. Expand
Introduction to Reinforcement Learning
TLDR
In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Expand
Policy Search for Motor Primitives in Robotics
TLDR
This paper extends previous work on policy learning from the immediate reward case to episodic reinforcement learning, resulting in a general, common framework also connected to policy gradient methods and yielding a novel algorithm for policy learning that is particularly well-suited for dynamic motor primitives. Expand
The Linear Programming Approach to Approximate Dynamic Programming
TLDR
An efficient method based on linear programming for approximating solutions to large-scale stochastic control problems by "fits" a linear combination of pre-selected basis functions to the dynamic programming cost-to-go function. Expand
...
1
2
...