• Corpus ID: 235377401

Pretraining Representations for Data-Efficient Reinforcement Learning

@inproceedings{Schwarzer2021PretrainingRF,
  title={Pretraining Representations for Data-Efficient Reinforcement Learning},
  author={Max Schwarzer and Nitarshan Rajkumar and Michael Noukhovitch and Ankesh Anand and Laurent Charlin and Devon Hjelm and Philip Bachman and Aaron C. Courville},
  booktitle={NeurIPS},
  year={2021}
}
Data efficiency is a key challenge for deep reinforcement learning. We address this problem by using unlabeled data to pretrain an encoder which is then finetuned on a small amount of task-specific data. To encourage learning representations which capture diverse aspects of the underlying MDP, we employ a combination of latent dynamics modelling and unsupervised goal-conditioned RL. When limited to 100k steps of interaction on Atari games (equivalent to two hours of human experience), our… 
VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning
TLDR
On a set of highly challenging hand manipulation tasks with sparse reward and realistic visual inputs, this framework learns 370%-1200% faster than the previous SOTA method while using an encoder that is 50 times smaller, fully demonstrating the potential of data-driven deep reinforcement learning.
Reinforcement Learning with Action-Free Pre-Training from Videos
TLDR
A framework that learns representations useful for understanding the dynamics via generative pretraining on videos that improves both performances and sample-efficiency of vision-based RL in a variety of manipulation and locomotion tasks is introduced.
Omni-Training for Data-Efficient Deep Learning Omni-Training for Data-Efficient Deep Learning
TLDR
It is found that even a tight combination of pre-training and meta-training through a joint representation flow cannot achieve both kinds of transferability, which motivates the proposed Omni-Training framework towards data-efficient deep learning.
Mask-based Latent Reconstruction for Reinforcement Learning
TLDR
This work proposes a simple yet effective self-supervised method, Mask-based Latent Reconstruction (MLR), to predict the complete state representations in the latent space from the observations with spatially and temporally masked pixels, and significantly improves the sample efficiency in RL and outperforms the state-of-the-art sample-efficient RL methods on multiple continuous benchmark environments.
Socially Supervised Representation Learning: The Role of Subjectivity in Learning Efficient Representations
TLDR
The results demonstrate how communication from subjective perspectives can lead to the acquisition of more abstract representations in multi-agent systems, opening promising perspectives for future research at the intersection of representation learning and emergent communication.
The Information Geometry of Unsupervised Reinforcement Learning
TLDR
This work shows that unsupervised skill discovery algorithms based on mutual information maximization do not learn skills that are optimal for every possible reward function, however, it is shown that the distribution over skills provides an optimal initialization minimizing regret against adversarially-chosen reward functions, assuming a certain type of adaptation procedure.
Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions
TLDR
This work proposes a new theoretically-motivated framework called Generalized Similarity Functions (GSF), which uses contrastive learning to train an offline RL agent to aggregate observations based on the similarity of their expected future behavior, where this similarity is quantified using generalized value functions.
DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations
TLDR
This paper proposes to learn the prototypes from the recurrent states of the world model, thereby distilling temporal structures from past observations and actions into the prototypes, and develops the resulting model, DREAMERPRO, making large performance gains on the DeepMind Control suite both in the standard setting and when there are complex background distractions.
Perceiving the World: Question-guided Reinforcement Learning for Text-based Games
TLDR
This paper introduces world-perceiving modules, which automatically decompose tasks and prune actions by answering questions about the environment, and proposes a two-phase training framework to decouple language learning from reinforcement learning, which further improves the sample efficiency.
Self-supervised Pretraining with Classification Labels for Temporal Activity Detection
TLDR
This work proposes a novel self-supervised pretraining method for detection leveraging classification labels to mitigate such disparity by introducing frame-level pseudo labels, multi-action frames, and action segments.
...
1
2
...

References

SHOWING 1-10 OF 76 REFERENCES
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
TLDR
A new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) is developed that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation.
Reinforcement Learning with Prototypical Representations
TLDR
Proto-RL is a self-supervised framework that ties representation learning with exploration through prototypical representations that serve as a summarization of the exploratory experience of an agent as well as a basis for representing observations.
Model-Based Reinforcement Learning for Atari
TLDR
Simulated Policy Learning (SimPLe), a complete model-based deep RL algorithm based on video prediction models, is described and a comparison of several model architectures is presented, including a novel architecture that yields the best results in the authors' setting.
Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning
Learning a good representation is an essential component for deep reinforcement learning (RL). Representation learning is especially important in multitask and partially observable settings where
DeepMDP: Learning Continuous Latent Space Models for Representation Learning
TLDR
This work introduces the concept of a DeepMDP, a parameterized latent space model that is trained via the minimization of two tractable losses: prediction of rewards and prediction of the distribution over next latent states, and shows that the optimization of these objectives guarantees the quality of the latent space as a representation of the state space.
Planning to Explore via Self-Supervised World Models
TLDR
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods, and in fact, almost matches the performances oracle which has access to rewards.
Do recent advancements in model-based deep reinforcement learning really improve data efficiency?
TLDR
It is demonstrated that the state-of-the-art model-free Rainbow DQN algorithm can be trained using a much smaller number of samples than it is commonly reported, at a fraction of complexity and computational costs.
Representation Matters: Offline Pretraining for Sequential Decision Making
TLDR
Through a variety of experiments utilizing standard offline RL datasets, it is found that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms that otherwise yield mediocre performance on their own.
Unsupervised State Representation Learning in Atari
TLDR
This work introduces a method that learns state representations by maximizing mutual information across spatially and temporally distinct features of a neural encoder of the observations and introduces a new benchmark based on Atari 2600 games to evaluate representations based on how well they capture the ground truth state variables.
Dueling Network Architectures for Deep Reinforcement Learning
TLDR
This paper presents a new neural network architecture for model-free reinforcement learning that leads to better policy evaluation in the presence of many similar-valued actions and enables the RL agent to outperform the state-of-the-art on the Atari 2600 domain.
...
1
2
3
4
5
...