Recent Advances in Hierarchical Reinforcement Learning

  title={Recent Advances in Hierarchical Reinforcement Learning},
  author={Andrew G. Barto and Sridhar Mahadevan},
  journal={Discrete Event Dynamic Systems},
Reinforcement learning is bedeviled by the curse of dimensionality: the number of parameters to be learned grows exponentially with the size of any compact encoding of a state. Recent attempts to combat the curse of dimensionality have turned to principled ways of exploiting temporal abstraction, where decisions are not required at each step, but rather invoke the execution of temporally-extended activities which follow their own policies until termination. This leads naturally to hierarchical… 

Learning state and action space hierarchies for reinforcement learning using action-dependent partitioning

This dissertation reviews several approaches to temporal abstraction and hierarchical organization that machine learning researchers have recently developed and presents a new method for the autonomous construction of hierarchical action and state representations in reinforcement learning, aimed at accelerating learning and extending the scope of such systems.

Efficient Reinforcement Learning with Hierarchies of Machines by Leveraging Internal Transitions

A new hierarchical reinforcement learning algorithm is proposed that automatically discovers many internal transitions where a machine calls another machine with the environment state unchanged, and shortcircuits them recursively in the computation of Q values.

Hierarchical Reinforcement Learning: A Survey and Open Research Challenges

This survey paper introduces a selection of problem-specific approaches, which provided insight in how to utilize often handcrafted abstractions in specific task settings, and introduces the Options framework, which provides a more generic approach, allowing abstractions to be discovered and learned semi-automatically.

Fast reinforcement learning with generalized policy updates

It is argued that complex decision problems can be naturally decomposed into multiple tasks that unfold in sequence or in parallel, and associating each task with a reward function can be seamlessly accommodated within the standard reinforcement-learning formalism.

Optimal Time Scales for Reinforcement Learning Behaviour Strategies

This thesis derives gradient descent-based algorithms for learning optimal termination conditions of options, based on a new option termination representation, and incorporates the proposed approach into policy-gradient methods with linear function approximation.

Masters Thesis: Hierarchical Reinforcement Learning for Spatio-temporal Planning

A novel algorithm for learning this hierarchical structure of a discrete-state goaloriented Factored-MDP (FMDP) is proposed in the thesis work taking into account the causal structure of the problem domain with the use of Dynamic Bayesian Network (DBN) model.

On Efficiency in Hierarchical Reinforcement Learning

This paper formalizes the intuition that HRL can exploit well repeating "subMDPs", with similar reward and transition structure, and establishes conditions under which planning with structure-induced options is near-optimal and computationally efficient.

A Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes

This paper proposes a hierarchical deep reinforcement learning approach for learning in hierarchical POMDP in which the tasks have only partial observability and possess hierarchical properties and proposes the deep hierarchical RL algorithm.

Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?

This work isolates and evaluates the claimed benefits of hierarchical RL on a suite of tasks encompassing locomotion, navigation, and manipulation and finds that most of the observed benefits of hierarchy can be attributed to improved exploration, as opposed to easier policy learning or imposed hierarchical structures.



Reinforcement Learning with a Hierarchy of Abstract Models

Simulations on a set of compositionally-structured navigation tasks show that H-DYNA can learn to solve them faster than conventional RL algorithms, and the abstract models can be used to solve stochastic control tasks.

Finding Structure in Reinforcement Learning

SKILLS discovers skills, which are partially defined action policies that arise in the context of multiple, related tasks, that are learned by minimizing the compactness of action policies, using a description length argument on their representation.

Continuous-Time Hierarchical Reinforcement Learning

This paper generalizes the MAXQ method to continuous-time discounted and average reward SMDP models, and describes two hierarchical reinforcement learning algorithms that are applied to a complex multiagent AGV scheduling problem, and compares their performance and speed with each other, as well as several well-knownAGV scheduling heuristics.

Hierarchical Memory-Based Reinforcement Learning

This paper shows how a hierarchy of behaviors can be used to create and select among variable length short-term memories appropriate for a task, and formalizes this idea in a framework called Hierarchical Suffix Memory (HSM).

Reinforcement Learning: An Introduction

This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

Localizing Search in Reinforcement Learning

A new RL method, Boundary Localized Reinforcement Learning (BLRL), is proposed, which maps RL into a mode switching problem where an agent deterministically chooses an action based on its state, and limits stochastic search to small areas around mode boundaries, drastically reducing computational cost.

Programmable Reinforcement Learning Agents

Together, the methods presented in this work comprise a system for agent design that allows the programmer to specify what they know, hint at what they suspect using soft shaping, and leave unspecified that which they don't know; the system then optimally completes the program through experience and takes advantage of the hierarchical structure of the specified program to speed learning.


This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.

Temporal abstraction in reinforcement learning

A general framework for prediction, control and learning at multiple temporal scales, and the way in which multi-time models can be used to produce plans of behavior very quickly, using classical dynamic programming or reinforcement learning techniques is developed.