Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition

@article{Dietterich2000HierarchicalRL,
  title={Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition},
  author={Thomas G. Dietterich},
  journal={ArXiv},
  year={2000},
  volume={cs.LG/9905014}
}
This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The decomposition, known as the MAXQ decomposition, has both a procedural semantics--as a subroutine hierarchy--and a declarative semantics--as a representation of the value function of a hierarchical… 

Figures from this paper

Automatic Induction of MAXQ Hierarchies
TLDR
A hierarchy learning algorithm that uses given DBN action and reward models as well as a single successful solution of a problem to discover tasksubtask hierarchies automatically.
Generating Hierarchical Structure in Reinforcement Learning from State Variables
TLDR
The CQ algorithm is presented, which decomposes and solves a Markov Decision Process (MDP) by automatically generating a hierarchy of smaller MDPs using state variables using a "nested Markov ordering".
A Compact, Hierarchical Q-function Decomposition
TLDR
The solution is based on recursively decomposing the exit value function in terms of Q-functions at higher levels of the hierarchy, which leads to an intuitively appealing runtime architecture in which a parent subroutine passes to its child a value function on the exit states and the child reasons about how its choices affect theexit value.
A compact, hierarchically optimal Q-function decomposition
TLDR
The solution is based on recursively decomposing the exit value function in terms of Q-functions at higher levels of the hierarchy, which leads to an intuitively appealing runtime architecture in which a parent subroutine passes to its child a value function on the exit states and the child reasons about how its choices affect theexit value.
Approximate planning for bayesian hierarchical reinforcement learning
TLDR
Simulation results show that the algorithm exploiting the action hierarchy performs significantly better than that of flat Bayesian reinforcement learning in terms of both reward and especially solving time, in at least one order of magnitude.
Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization
TLDR
The HASSLE (Hierarchical Assignment of Subgoals to Subpolicies LEarning) algorithm is proposed, in which high-level policies automatically discover subgoals, and low- level policies learn to specialize for different subGoals.
Offline reinforcement learning with task hierarchies
TLDR
This work studies sample collection strategies for offline RL that are consistent with a provided task hierarchy while still providing good exploration of the state-action space and shows more sample-efficient convergence to policies with value greater than or equal to hierarchically optimal policies found through an online hierarchical RL approach.
Globally Optimal Hierarchical Reinforcement Learning for Linearly-Solvable Markov Decision Processes
TLDR
This work presents a novel approach to hierarchical reinforcement learning for linearly-solvable Markov decision processes, and shows that it is significantly more sample efficient than that of a flat learner and similar hierarchical approaches when the set of boundary states is smaller than the entire state space.
Hierarchical control and learning for markov decision processes
TLDR
This dissertation introduces the HAM for generating hierarchical, temporally abstract actions and shows that traditional MDP algorithms can be used to optimally refine HAMs for new tasks.
Hierarchical Reinforcement Learning with Integrated Discovery of Salient Subgoals
TLDR
In LIDOSS, the search space of a high level policy can be reduced by focusing only on the subgoal states that have high saliency, and the results show that LIDoss outperforms Hierarchical Actor Critic (HAC), a state-of-the-art HRL method, in the fixed goal tasks.
...
...

References

SHOWING 1-10 OF 58 REFERENCES
The MAXQ Method for Hierarchical Reinforcement Learning
TLDR
The paper defines a hierarchical Q learning algorithm, proves its convergence, and shows experimentally that it can learn much faster than ordinary “flat” Q learning.
Hierarchical control and learning for markov decision processes
TLDR
This dissertation introduces the HAM for generating hierarchical, temporally abstract actions and shows that traditional MDP algorithms can be used to optimally refine HAMs for new tasks.
Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs
TLDR
This paper shows that by using a new kind of automata caily generated abstract action hierarchy that with N states, preparing for all of N possible goals can be much much cheaper than N times the work of preparing for one goal.
Between MDPs and Semi-MDPs : Learning , Planning , and Representing Knowledge at Multiple Temporal Scales
TLDR
It is argued that options and their models provide hitherto missing aspects of a powerful, clear, and expressive framework for representing and organizing knowledge.
Reinforcement Learning with Hierarchies of Machines
TLDR
This work presents provably convergent algorithms for problem-solving and learning with hierarchical machines and demonstrates their effectiveness on a problem with several thousand states.
Multi-time Models for Temporally Abstract Planning
TLDR
A more general form of temporally abstract model is introduced, the multi-time model, and its suitability for planning and learning by virtue of its relationship to the Bellman equations is established.
Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales
TLDR
It is argued that options and their models provide hitherto missing aspects of a powerful, clear, and expressive framework for representing and organizing knowledge.
Hierarchical Explanation-Based Reinforcement Learning
TLDR
Hierarchical EBRL can eeectively learn optimal policies in some of these sequential task domains even when the subgoals weakly interact with each other.
Decision-Theoretic Planning: Structural Assumptions and Computational Leverage
TLDR
This paper presents an overview and synthesis of MDP-related methods, showing how they provide a unifying framework for modeling many classes of planning problems studied in AI, and describes structural properties of M DPs that, when exhibited by particular classes of problems, can be exploited in the construction of optimal or approximately optimal policies or plans.
Hierarchical Solution of Markov Decision Processes using Macro-actions
TLDR
A hierarchical model is proposed (using an abstract MDP) that works with macro-actions only, and that significantly reduces the size of the state space, and is shown to justify the computational overhead of macro-action generation.
...
...