• Corpus ID: 231925127

Discovery of Options via Meta-Learned Subgoals

  title={Discovery of Options via Meta-Learned Subgoals},
  author={Vivek Veeriah and Tom Zahavy and Matteo Hessel and Zhongwen Xu and Junhyuk Oh and Iurii Kemaev and H. V. Hasselt and David Silver and Satinder Singh},
  booktitle={Neural Information Processing Systems},
Temporal abstractions in the form of options have been shown to help reinforcement learning (RL) agents learn faster. However, despite prior work on this topic, the problem of discovering options through interaction with an environment remains a challenge. In this paper, we introduce a novel meta-gradient approach for discovering useful options in multi-task RL environments. Our approach is based on a manager-worker decomposition of the RL agent, in which a manager maximises rewards from the… 

Reusable Options through Gradient-based Meta Learning

This work forms the desiderata for reusable options and uses these to frame the problem of learning options as a gradient-based meta-learning problem, and formulates an objective that explicitly incentivizes options which allow a higher-level decision maker to adjust in few steps to different tasks.

Toward Discovering Options that Achieve Faster Planning

A new objective for option discovery that emphasizes the computational advantage of using options in planning for a given set of episodic tasks and a given number of options is proposed and an algorithm is proposed that better optimizes the proposed objective.

Deep Hierarchical Planning from Pixels

Director is introduced, a practical method for learning hierarchical behaviors directly from pixels by planning inside the latent space of a learned world model, and the decisions are interpretable because the world model can decode goals into images for visualization.

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

This survey seeks to unify the field of AutoRL, provide a common taxonomy, discuss each area in detail and pose open problems of interest to researchers going forward.

Continual Auxiliary Task Learning

This work investigates a reinforcement learning system designed to learn a collection of auxiliary tasks, with a behavior policy learning to take actions to improve those auxiliary predictions, and develops an algorithm based on successor features that facilitates tracking under non-stationary rewards.

Average-Reward Learning and Planning with Options

This work extends the options framework for temporal abstraction in reinforcement learning from discounted Markov decision processes (MDPs) to average-reward MDPs, and extends the notion of option-interrupting behavior from the discounted to the average-Reward formulation.

Meta-Gradients in Non-Stationary Environments

It is suggested that contextualising meta-gradients can play a pivotal role in extracting high performance from meta- gradients in non-stationary settings, and whether meta-gradient methods provide a bigger advantage in highly non- stationary environments.

Settling the Bias and Variance of Meta-Gradient Estimation for Meta-Reinforcement Learning

It is argued that the stochastic meta-gradient estimation adopted by many existing MGRL methods are in fact biased; the bias comes from two sources: the compositional bias that is inborn in the structure of compositional optimisation problems and the bias of multi-step Hessian estimation caused by direct automatic differentiation.

Subgoal Search For Complex Reasoning Tasks

It is shown that a simple approach of generating k -th step ahead subgoals is surprisingly efficient on three challenging domains: two popular puzzle games, Sokoban and the Rubik’s Cube, and an inequality proving benchmark INT.

Hierarchical Representation Learning for Markov Decision Processes

A novel method for learning hierarchical representations of Markov decision processes by partitioning the state space into subsets, and defines subtasks for performing transitions between the partitions, which is suitable for high-dimensional problems with large state spaces.



Probabilistic inference for determining options in reinforcement learning

The proposed approach is based on parametric option representations and works well in combination with current policy search methods, which are particularly well suited for continuous real-world tasks.

Eigenoption Discovery through the Deep Successor Representation

This paper proposes an algorithm that discovers eigenoptions while learning non-linear state representations from raw pixels, and exploits recent successes in the deep reinforcement learning literature and the equivalence between proto-value functions and the successor representation.

A Laplacian Framework for Option Discovery in Reinforcement Learning

This paper addresses the option discovery problem by showing how PVFs implicitly define options by introducing eigenpurposes, intrinsic reward functions derived from the learned representations, which traverse the principal directions of the state space.

Meta Learning Shared Hierarchies

A metalearning approach for learning hierarchically structured policies, improving sample efficiency on unseen tasks through the use of shared primitives---policies that are executed for large numbers of timesteps, and provides a concrete metric for measuring the strength of such hierarchies.

PAC-inspired Option Discovery in Lifelong Reinforcement Learning

This work provides the first formal analysis of the sample complexity, a measure of learning speed, of reinforcement learning with options, and inspires a novel option-discovery algorithm that aims at minimizing overall sample complexity in lifelong reinforcement learning.

Building Portable Options: Skill Transfer in Reinforcement Learning

This work introduces the notion of learning options in agentspace, the space generated by a feature set that is present and retains the same semantics across successive problem instances, rather than in problemspace.

Discovery of Useful Questions as Auxiliary Tasks

This work presents a novel method for a reinforcement learning (RL) agent to discover questions formulated as general value functions or GVFs, a fairly rich form of knowledge representation, and shows how such auxiliary tasks can improve the data efficiency of an actor-critic agent.

The Option-Critic Architecture

This work derives policy gradient theorems for options and proposes a new option-critic architecture capable of learning both the internal policies and the termination conditions of options, in tandem with the policy over options, and without the need to provide any additional rewards or subgoals.

On Learning Intrinsic Rewards for Policy Gradient Methods

This paper derives a novel algorithm for learning intrinsic rewards for policy-gradient based learning agents and compares the performance of an augmented agent that uses this algorithm to provide additive intrinsic rewards to an A2C-based policy learner and a PPO-basedpolicy learner with a baselineAgent that uses the same policy learners but with only extrinsic rewards.