Hierarchies of Reward Machines

@article{FurelosBlanco2022HierarchiesOR,
  title={Hierarchies of Reward Machines},
  author={Daniel Furelos-Blanco and Mark Law and Anders Jonsson and Krysia Broda and A. Russo},
  journal={ArXiv},
  year={2022},
  volume={abs/2205.15752}
}
Reward machines (RMs) are a recent formalism for representing the reward function of a reinforcement learning task through a finite-state machine whose edges encode landmarks of the task using high-level events. The structure of RMs enables the decomposition of a task into simpler and independently solvable subtasks that help tackle long-horizon and/or sparse reward tasks. We propose a formalism for further abstracting the subtask structure by endowing an RM with the ability to call other RMs… 

References

SHOWING 1-10 OF 52 REFERENCES

Learning Reward Machines for Partially Observable Reinforcement Learning

It is shown that RMs can be learned from experience, instead of being specified by the user, and that the resulting problem decomposition can be used to effectively solve partially observable RL problems.

Reward Machines for Cooperative Multi-Agent Reinforcement Learning

The proposed novel interpretation of RMs in the multi-agent setting explicitly encodes required teammate interdependencies and independencies, allowing the team-level task to be decomposed into sub-tasks for individual agents, and provides a natural approach to decentralized learning.

Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning

This paper proposes reward machines, a type of finite state machine that supports the specification of reward functions while exposing reward function structure, and describes different methodologies to exploit this structure to support learning, including automated reward shaping, task decomposition, and counterfactual reasoning with off-policy learning.

Joint Inference of Reward Machines and Policies for Reinforcement Learning

An iterative algorithm that performs joint inference of reward machines and policies for RL (more specifically, q-learning) and it is proved that the proposed algorithm converges almost surely to an optimal policy in the limit.

LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning

This work proposes using reward machines (RMs), which are automata-based representations that expose reward function structure, as a normal form representation for reward functions, to ease the burden of complex reward function specification.

Modular Multitask Reinforcement Learning with Policy Sketches

Experiments show that using the approach to learn policies guided by sketches gives better performance than existing techniques for learning task-specific or shared policies, while naturally inducing a library of interpretable primitive behaviors that can be recombined to rapidly adapt to new tasks.

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

h-DQN is presented, a framework to integrate hierarchical value functions, operating at different temporal scales, with intrinsically motivated deep reinforcement learning, and allows for flexible goal specifications, such as functions over entities and relations.

DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning

This paper proposes DeepSynth, a method for effective training of deep Reinforcement Learning (RL) agents when the reward is sparse and non-Markovian, but at the same time progress towards the reward

Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning

Q-Learning for Reward Machines is presented, an algorithm which appropriately decomposes the reward machine and uses off-policy q-learning to simultaneously learn subpolicies for the different components and is guaranteed to converge to an optimal policy in the tabular case.

Learning Multi-Level Hierarchies with Hindsight

A new Hierarchical Reinforcement Learning (HRL) framework that can overcome the instability issues that arise when agents try to jointly learn multiple levels of policies and is the first to successfully learn 3-level hierarchies in parallel in tasks with continuous state and action spaces.
...