• Corpus ID: 239885485

Landmark-Guided Subgoal Generation in Hierarchical Reinforcement Learning

  title={Landmark-Guided Subgoal Generation in Hierarchical Reinforcement Learning},
  author={Junsup Kim and Younggyo Seo and Jinwoo Shin},
Goal-conditioned hierarchical reinforcement learning (HRL) has shown promising results for solving complex and long-horizon RL tasks. However, the action space of high-level policy in the goal-conditioned HRL is often large, so it results in poor exploration, leading to inefficiency in training. In this paper, we present HIerarchical reinforcement learning Guided by Landmarks (HIGL), a novel framework for training a high-level policy with a reduced action space guided by landmarks, i.e… 


Exploration by Random Network Distillation
An exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed and a method to flexibly combine intrinsic and extrinsic rewards that enables significant progress on several hard exploration Atari games is introduced.
Data-Efficient Hierarchical Reinforcement Learning
This paper studies how to develop HRL algorithms that are general, in that they do not make onerous additional assumptions beyond standard RL algorithms, and efficient, in the sense that they can be used with modest numbers of interaction samples, making them suitable for real-world problems such as robotic control.
Near-Optimal Representation Learning for Hierarchical Reinforcement Learning
Results on a number of difficult continuous-control tasks show that the developed notion of sub-optimality of a representation, defined in terms of expected reward of the optimal hierarchical policy using this representation, yields qualitatively better representations as well as quantitatively better hierarchical policies compared to existing methods.
k-means++: the advantages of careful seeding
By augmenting k-means with a very simple, randomized seeding technique, this work obtains an algorithm that is Θ(logk)-competitive with the optimal clustering.
Learning Subgoal Representations with Slow Dynamics
State Entropy Maximization with Random Encoders for Efficient Exploration
The experiments show that RE3 significantly improves the sample-efficiency of both model-free and model-based RL methods on locomotion and navigation tasks from DeepMind Control Suite and MiniGrid benchmarks, and allows learning diverse behaviors without extrinsic rewards.
Subgoal Search For Complex Reasoning Tasks
It is shown that a simple approach of generating k-th step ahead subgoals is surprisingly efficient on three challenging domains: two popular puzzle games, Sokoban and the Rubik’s Cube, and an inequality proving benchmark INT.
World Model as a Graph: Learning Latent Landmarks for Planning
This work proposes to learn graph-structured world models composed of sparse, multi-step transitions and devise a novel algorithm to learn latent landmarks that are scattered across the goal space as the nodes on the graph, and believes this work is an important step towards scalable planning in reinforcement learning.
A Policy Gradient Method for Task-Agnostic Exploration
It is argued that the entropy of the state distribution induced by limited-horizon trajectories is a sensible target, and a novel and practical policy-search algorithm, Maximum Entropy POLicy optimization (MEPOL), is presented to learn a policy that maximizes a non-parametric, $k$-nearest neighbors estimate of thestate distribution entropy.
Agent57: Outperforming the Atari Human Benchmark
This work proposes Agent57, the first deep RL agent that outperforms the standard human benchmark on all 57 Atari games and trains a neural network which parameterizes a family of policies ranging from very exploratory to purely exploitative.