• Corpus ID: 240070726

URLB: Unsupervised Reinforcement Learning Benchmark

  title={URLB: Unsupervised Reinforcement Learning Benchmark},
  author={Michael Laskin and Denis Yarats and Hao Liu and Kimin Lee and Albert Zhan and Kevin Lu and Catherine Cang and Lerrel Pinto and P. Abbeel},
Deep Reinforcement Learning (RL) has emerged as a powerful paradigm to solve a range of complex yet specific control tasks. Yet training generalist agents that can quickly adapt to new tasks remains an outstanding challenge. Recent advances in unsupervised RL have shown that pre-training RL agents with self-supervised intrinsic rewards can result in efficient adaptation. However, these algorithms have been hard to compare and develop due to the lack of a unified benchmark. To this end, we… 

Figures and Tables from this paper

Light-weight probing of unsupervised representations for Reinforcement Learning

This work designs an evaluation protocol for unsupervised RL representations with lower variance and up to 600x lower computational cost, and improves existing self-supervised learning (SSL) recipes for RL, high-lighting the importance of the forward model, the size of the visual backbone, and the precise formulation of the unsuper supervised objective.

Pretraining in Deep Reinforcement Learning: A Survey

This survey seeks to systematically review existing works in pretraining for deep reinforcement learning, provide a taxonomy of these methods, discuss each sub-field, and bring attention to open problems and future directions.

Unsupervised Model-based Pre-training for Data-efficient Reinforcement Learning from Pixels

This work closes the performance gap in the Unsupervised RL Benchmark, a collection of tasks to be solved in a data-efficient manner, after interacting with the environment in a self-supervised way, and investigates the lim-itations of the pre-trained agent.

APD: Learning Diverse Behaviors for Reinforcement Learning Through Unsupervised Active Pre-Training

This work introduces an unsupervised active pre-training algorithm for diverse behavior induction (APD) that explicitly characterize the behavior variables with a state-dependent sampling method, and the agent can decompose the entire state space into parts for fine-grained and diverse behavior learning.

The Challenges of Exploration for Offline Reinforcement Learning

This work proposes to evaluate the quality of collected data by transferring the collected data and inferring policies with reward relabelling and standard offline RL algorithms, and evaluates a wide variety of data collection strategies, including a new exploration agent, Intrinsic Model Predictive Control, using this scheme.

EUCLID: Towards Efficient Unsupervised Reinforcement Learning with Multi-choice Dynamics Model

This work introduces a novel model-fused paradigm to jointly pre-train the dynamics model and unsupervised exploration policy in the pre-training phase, thus better leveraging the environmental samples and improving the downstream task sampling efficiency.

Does Zero-Shot Reinforcement Learning Exist?

Improved losses and new SF models are introduced, and the viability of zero-shot RL schemes systematically on tasks from the Unsupervised RL benchmark is tested, to disentangle universal representation learning from exploration.

POLTER: Policy Trajectory Ensemble Regularization for Unsupervised Reinforcement Learning

This work presents POLTER (Policy Trajectory Ensemble Regularization) – a general method to regularize the pretraining that can be applied to any URL algorithm and is especially useful on data- and knowledge-based URL algorithms.

Learning General World Models in a Handful of Reward-Free Deployments

This work introduces the reward-free deployment efficiency setting, a new paradigm for RL research, and presents CASCADE, a novel approach for self-supervised exploration in this new setting, using an information theoretic objective inspired by Bayesian Active Learning.

The Information Geometry of Unsupervised Reinforcement Learning

This work shows that unsupervised skill discovery algorithms based on mutual information maximization do not learn skills that are optimal for every possible reward function, however, it is shown that the distribution over skills provides an optimal initialization minimizing regret against adversarially-chosen reward functions, assuming a certain type of adaptation procedure.



Reinforcement Learning with Unsupervised Auxiliary Tasks

This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.

Beyond Fine-Tuning: Transferring Behavior in Reinforcement Learning

This work introduces Behavior Transfer (BT), a technique that leverages pre-trained policies for exploration and that is complementary to transferring neural network weights, and shows that, when combined with large-scale pre-training in the absence of rewards, existing intrinsic motivation objectives can lead to the emergence of complex behaviors.

Decoupling Representation Learning from Reinforcement Learning

A new unsupervised learning task, called Augmented Temporal Contrast (ATC), which trains a convolutional encoder to associate pairs of observations separated by a short time difference, under image augmentations and using a contrastive loss.

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

This work introduces benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL, and releases benchmark tasks and datasets with a comprehensive evaluation of existing algorithms and an evaluation protocol together with an open-source codebase.

Benchmarking Deep Reinforcement Learning for Continuous Control

This work presents a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, task with partial observations, and tasks with hierarchical structure.

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods.

RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning

This paper proposes a benchmark called RL Unplugged to evaluate and compare offline RL methods, a suite of benchmarks that will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

An open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks is proposed to make it possible to develop algorithms that generalize to accelerate the acquisition of entirely new, held-out tasks.

Data-Efficient Reinforcement Learning with Self-Predictive Representations

The method, Self-Predictive Representations (SPR), trains an agent to predict its own latent state representations multiple steps into the future using an encoder which is an exponential moving average of the agent’s parameters and a learned transition model.

Reinforcement Learning with Augmented Data

It is shown that augmentations such as random translate, crop, color jitter, patch cutout, random convolutions, and amplitude scale can enable simple RL algorithms to outperform complex state-of-the-art methods across common benchmarks.