• Corpus ID: 11711975

Hierarchical Relative Entropy Policy Search

@article{Daniel2012HierarchicalRE,
  title={Hierarchical Relative Entropy Policy Search},
  author={Christian Daniel and Gerhard Neumann and Jan Peters},
  journal={J. Mach. Learn. Res.},
  year={2012},
  volume={17},
  pages={93:1-93:50}
}
Many reinforcement learning (RL) tasks, especially in robotics, consist of multiple sub-tasks that are strongly structured. Such task structures can be exploited by incorporating hierarchical policies that consist of gating networks and sub-policies. However, this concept has only been partially explored for real world settings and complete methods, derived from first principles, are needed. Real world settings are challenging due to large and continuous state-action spaces that are prohibitive… 

Figures and Tables from this paper

Layered Relative Entropy Policy Search
Layered direct policy search for learning hierarchical skills
TLDR
A new HRL algorithm based on information theoretic principles to autonomously uncover a diverse set of sub-policies and their activation policies is proposed and evaluated on two high dimensional continuous tasks.
ADVANTAGE-WEIGHTED INFORMATION MAXIMIZA-
  • Computer Science
  • 2018
TLDR
This paper proposes an HRL method that learns a latent variable of a hierarchical policy using mutual information maximization and introduces advantage-weighted importance sampling to learn option policies that correspond to modes of the advantage function.
VIA ADVANTAGE-WEIGHTED INFORMATION MAXIMIZATION
TLDR
This paper proposes an HRL method that learns a latent variable of a hierarchical policy using mutual information maximization and introduces advantage-weighted importance sampling to learn option policies that correspond to modes of the advantage function.
ADVANTAGE-WEIGHTED INFORMATION MAXIMIZATION
TLDR
This paper proposes an HRL method that learns a latent variable of a hierarchical policy using mutual information maximization and introduces advantage-weighted importance sampling to learn option policies that correspond to modes of the advantage function.
Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization
TLDR
This paper proposes an HRL method that learns a latent variable of a hierarchical policy using mutual information maximization and introduces advantage-weighted importance sampling to learn option policies that correspond to modes of the advantage function.
Probabilistic inference for determining options in reinforcement learning
TLDR
The proposed approach is based on parametric option representations and works well in combination with current policy search methods, which are particularly well suited for continuous real-world tasks.
Hierarchical Policy Search via Return-Weighted Density Estimation
TLDR
This paper proposes a novel method called hierarchical policy search via return-weighted density estimation (HPSDE), which can efficiently identify the modes through density estimation with return- Weighted importance sampling and automatically determines the number and the location of option policies, which significantly reduces the burden of hyper-parameters tuning.
Data-Efficient Hierarchical Reinforcement Learning
TLDR
This paper studies how to develop HRL algorithms that are general, in that they do not make onerous additional assumptions beyond standard RL algorithms, and efficient, in the sense that they can be used with modest numbers of interaction samples, making them suitable for real-world problems such as robotic control.
Hierarchical Reinforcement Learning for Concurrent Discovery of Compound and Composable Policies
TLDR
An algorithm for learning both compound and composable policies within the same learning process by exploiting the off-policy data generated from the compound policy is proposed, built on a maximum entropy RL approach to favor exploration during the learning process.
...
...

References

SHOWING 1-10 OF 64 REFERENCES
Hierarchical Policy Gradient Algorithms
TLDR
This paper proposes a family of hierarchical policy gradient algorithms for problems with continuous state and/or action spaces and introduces a class of hierarchical hybrid algorithms, in which a group of subtasks are formulated as value function-based RL problems and the others as PGRL problems.
Recent Advances in Hierarchical Reinforcement Learning
TLDR
This work reviews several approaches to temporal abstraction and hierarchical organization that machine learning researchers have recently developed and discusses extensions of these ideas to concurrent activities, multiagent coordination, and hierarchical memory for addressing partial observability.
Data-Efficient Generalization of Robot Skills with Contextual Policy Search
TLDR
This work proposes a new model-based policy search approach that can also learn contextual upper-level policies and achieves a substantial improvement in learning speed compared to existing methods on simulated and real robotic tasks.
Infinite-Horizon Policy-Gradient Estimation
TLDR
GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies, is introduced.
Variational Inference for Policy Search in changing situations
TLDR
Variational Inference for Policy Search (VIP) has several interesting properties and meets the performance of state-of-the-art methods while being applicable to simultaneously learning in multiple situations.
Unified Inter and Intra Options Learning Using Policy Gradient Methods
TLDR
This paper proposes a modular parameterization of intra-option policies together with option termination conditions and the option selection policy (inter options), and shows that these three decision components may be viewed as a unified policy over an augmented state-action space, to which standard policy gradient algorithms may be applied.
Bayesian Policy Search with Policy Priors
TLDR
This work casts Markov Chain Monte Carlo as a stochastic, hill-climbing policy search algorithm that learns to learn a structured policy efficiently and shows how inference over the latent variables in these policy priors enables intra- and intertask transfer of abstract knowledge.
Policy Improvement Methods: Between Black-Box Optimization and Episodic Reinforcement Learning
TLDR
PIBB is a BBO algorithm, and, more specifically, that it is a special case of the "Covariance Matrix Adaptation - Evolutionary Strategy" algorithm, which implies that the simpler PIBB outperforms PI2 on simple evaluation tasks in terms of convergence speed and final cost.
Policy search for motor primitives in robotics
TLDR
A novel EM-inspired algorithm for policy learning that is particularly well-suited for dynamical system motor primitives is introduced and applied in the context of motor learning and can learn a complex Ball-in-a-Cup task on a real Barrett WAM™ robot arm.
...
...