• Corpus ID: 235658502

Unsupervised Skill Discovery with Bottleneck Option Learning

@inproceedings{Kim2021UnsupervisedSD,
  title={Unsupervised Skill Discovery with Bottleneck Option Learning},
  author={Jaekyeom Kim and Seohong Park and Gunhee Kim},
  booktitle={ICML},
  year={2021}
}
Having the ability to acquire inherent skills from environments without any external rewards or supervision like humans is an important problem. We propose a novel unsupervised skill discovery method named Information Bottleneck Option Learning (IBOL). On top of the linearization of environments that promotes more various and distant state transitions, IBOL enables the discovery of diverse skills. It provides the abstraction of the skills learned with the information bottleneck framework for… 

Lipschitz-constrained Unsupervised Skill Discovery

TLDR
Through experiments on various MuJoCo robotic locomotion and manipulation environments, it is demonstrated that LSD outperforms previous approaches in terms of skill diversity, state space coverage, and performance on seven downstream tasks including the challenging task of following multiple goals on Humanoid.

Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis

TLDR
This paper introduces a model-agnostic method for discovery of behavior clusters in multiagent domains, using variational inference to learn a hierarchy of behaviors at the joint and local agent levels, and illustrates the effectiveness of the approach for enabling the coupled understanding of behaviors.

Braxlines: Fast and Interactive Toolkit for RL-driven Behavior Engineering beyond Reward Maximization

TLDR
Braxlines is introduced, a toolkit for fast and interactive RL-driven behavior generation beyond simple reward maximization that includes Composer, a programmatic API for generating continuous control environments, and set of stable and well-tested baselines for two families of algorithms -- mutual information maximization (MiMax) and divergence minimization (DMin).

An information-theoretic perspective on intrinsic motivation in reinforcement learning: a survey

TLDR
This work computationally revisit the notions of surprise, novelty and skill learning, and suggests that novelty and surprise can assist the building of a hierarchy of transferable skills that further abstracts the environment and makes the exploration process more robust.

References

SHOWING 1-10 OF 44 REFERENCES

Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills

TLDR
This work performs an extensive evaluation of skill discovery methods on controlled environments and shows that EDL offers significant advantages, such as overcoming the coverage problem, reducing the dependence of learned skills on the initial state, and allowing the user to define a prior over which behaviors should be learned.

Dynamics-Aware Unsupervised Discovery of Skills

TLDR
This work proposes an unsupervised learning algorithm, Dynamics-Aware Discovery of Skills (DADS), which simultaneously discovers predictable behaviors and learns their dynamics, and demonstrates that zero-shot planning in the learned latent space significantly outperforms standard MBRL and model-free goal-conditioned RL, and substantially improves over prior hierarchical RL methods for unsuper supervised skill discovery.

Diversity is All You Need: Learning Skills without a Reward Function

TLDR
The proposed DIAYN ("Diversity is All You Need"), a method for learning useful skills without a reward function, learns skills by maximizing an information theoretic objective using a maximum entropy policy.

Stochastic Neural Networks for Hierarchical Reinforcement Learning

TLDR
This work proposes a general framework that first learns useful skills in a pre-training environment, and then leverages the acquired skills for learning faster in downstream tasks, and uses Stochastic Neural Networks combined with an information-theoretic regularizer to efficiently pre-train a large span of skills.

Hierarchical Reinforcement Learning By Discovering Intrinsic Options

TLDR
The effectiveness of HIDIO is demonstrated compared against other reinforcement learning methods in achieving high rewards with better sample efficiency across a variety of robotic navigation and manipulation tasks.

OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning

TLDR
This work finds that, when presented with offline data composed of a variety of behaviors, an effective way to leverage this data is to extract a continuous space of recurring and temporally extended primitive behaviors before using these primitives for downstream task learning.

Variational Option Discovery Algorithms

TLDR
A tight connection between variational option discovery methods and variational autoencoders is highlighted, and Variational Autoencoding Learning of Options by Reinforcement (VALOR), a new method derived from the connection is introduced, and a curriculum learning approach is proposed.

Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations

TLDR
This paper theoretically shows that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data, and trains more than 12000 models covering most prominent methods and evaluation metrics on seven different data sets.

Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play

TLDR
This work describes a simple scheme that allows an agent to learn about its environment in an unsupervised manner, and focuses on two kinds of environments: (nearly) reversible environments and environments that can be reset.

beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework

Learning an interpretable factorised representation of the independent data generative factors of the world without supervision is an important precursor for the development of artificial