• Corpus ID: 240070726

# URLB: Unsupervised Reinforcement Learning Benchmark

@article{Laskin2021URLBUR,
title={URLB: Unsupervised Reinforcement Learning Benchmark},
author={Michael Laskin and Denis Yarats and Hao Liu and Kimin Lee and Albert Zhan and Kevin Lu and Catherine Cang and Lerrel Pinto and P. Abbeel},
journal={ArXiv},
year={2021},
volume={abs/2110.15191}
}
• Published 28 October 2021
• Computer Science
• ArXiv
Deep Reinforcement Learning (RL) has emerged as a powerful paradigm to solve a range of complex yet specific control tasks. Yet training generalist agents that can quickly adapt to new tasks remains an outstanding challenge. Recent advances in unsupervised RL have shown that pre-training RL agents with self-supervised intrinsic rewards can result in efficient adaptation. However, these algorithms have been hard to compare and develop due to the lack of a unified benchmark. To this end, we…
43 Citations

## Figures and Tables from this paper

• Computer Science
ArXiv
• 2022
This work designs an evaluation protocol for unsupervised RL representations with lower variance and up to 600x lower computational cost, and improves existing self-supervised learning (SSL) recipes for RL, high-lighting the importance of the forward model, the size of the visual backbone, and the precise formulation of the unsuper supervised objective.
• Computer Science
ArXiv
• 2022
This survey seeks to systematically review existing works in pretraining for deep reinforcement learning, provide a taxonomy of these methods, discuss each sub-ﬁeld, and bring attention to open problems and future directions.
• Computer Science
• 2022
This work closes the performance gap in the Unsupervised RL Benchmark, a collection of tasks to be solved in a data-efficient manner, after interacting with the environment in a self-supervised way, and investigates the lim-itations of the pre-trained agent.
• Computer Science
IEEE Robotics and Automation Letters
• 2022
This work introduces an unsupervised active pre-training algorithm for diverse behavior induction (APD) that explicitly characterize the behavior variables with a state-dependent sampling method, and the agent can decompose the entire state space into parts for fine-grained and diverse behavior learning.
• Computer Science
ArXiv
• 2022
This work proposes to evaluate the quality of collected data by transferring the collected data and inferring policies with reward relabelling and standard offline RL algorithms, and evaluates a wide variety of data collection strategies, including a new exploration agent, Intrinsic Model Predictive Control, using this scheme.
• Computer Science
ArXiv
• 2022
This work introduces a novel model-fused paradigm to jointly pre-train the dynamics model and unsupervised exploration policy in the pre-training phase, thus better leveraging the environmental samples and improving the downstream task sampling efﬁciency.
• Computer Science
ArXiv
• 2022
Improved losses and new SF models are introduced, and the viability of zero-shot RL schemes systematically on tasks from the Unsupervised RL benchmark is tested, to disentangle universal representation learning from exploration.
• Computer Science
ArXiv
• 2022
This work presents POLTER (Policy Trajectory Ensemble Regularization) – a general method to regularize the pretraining that can be applied to any URL algorithm and is especially useful on data- and knowledge-based URL algorithms.
• Computer Science
ArXiv
• 2022
This work introduces the reward-free deployment efﬁciency setting, a new paradigm for RL research, and presents CASCADE, a novel approach for self-supervised exploration in this new setting, using an information theoretic objective inspired by Bayesian Active Learning.
• Computer Science
ICLR
• 2022
This work shows that unsupervised skill discovery algorithms based on mutual information maximization do not learn skills that are optimal for every possible reward function, however, it is shown that the distribution over skills provides an optimal initialization minimizing regret against adversarially-chosen reward functions, assuming a certain type of adaptation procedure.

## References

SHOWING 1-10 OF 65 REFERENCES

• Computer Science
ICLR
• 2017
This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.
• Computer Science
• 2021
This work introduces Behavior Transfer (BT), a technique that leverages pre-trained policies for exploration and that is complementary to transferring neural network weights, and shows that, when combined with large-scale pre-training in the absence of rewards, existing intrinsic motivation objectives can lead to the emergence of complex behaviors.
• Computer Science
ICML
• 2021
A new unsupervised learning task, called Augmented Temporal Contrast (ATC), which trains a convolutional encoder to associate pairs of observations separated by a short time difference, under image augmentations and using a contrastive loss.
• Computer Science
ArXiv
• 2020
This work introduces benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL, and releases benchmark tasks and datasets with a comprehensive evaluation of existing algorithms and an evaluation protocol together with an open-source codebase.
• Computer Science
ICML
• 2016
This work presents a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, task with partial observations, and tasks with hierarchical structure.
• Computer Science
ICML
• 2018
This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods.
• Computer Science
• 2020
This paper proposes a benchmark called RL Unplugged to evaluate and compare offline RL methods, a suite of benchmarks that will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.
• Computer Science
CoRL
• 2019
An open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks is proposed to make it possible to develop algorithms that generalize to accelerate the acquisition of entirely new, held-out tasks.
• Computer Science
ICLR
• 2021
The method, Self-Predictive Representations (SPR), trains an agent to predict its own latent state representations multiple steps into the future using an encoder which is an exponential moving average of the agent’s parameters and a learned transition model.
• Computer Science
NeurIPS
• 2020
It is shown that augmentations such as random translate, crop, color jitter, patch cutout, random convolutions, and amplitude scale can enable simple RL algorithms to outperform complex state-of-the-art methods across common benchmarks.