• Corpus ID: 222163237

Data-Efficient Reinforcement Learning with Self-Predictive Representations

@inproceedings{Schwarzer2021DataEfficientRL,
  title={Data-Efficient Reinforcement Learning with Self-Predictive Representations},
  author={Max Schwarzer and Ankesh Anand and Rishab Goel and R. Devon Hjelm and Aaron C. Courville and Philip Bachman},
  booktitle={ICLR},
  year={2021}
}
While deep reinforcement learning excels at solving tasks where large amounts of data can be collected through virtually unlimited interaction with the environment, learning from limited interaction remains a key challenge. We posit that an agent can learn more efficiently if we augment reward maximization with self-supervised objectives based on structure in its visual input and sequential interaction with the environment. Our method, Self-Predictive Representations (SPR), trains an agent to… 

Figures and Tables from this paper

Accelerating Representation Learning with View-Consistent Dynamics in Data-Efficient Reinforcement Learning
TLDR
This work introduces a formalism of Multiview Markov Decision Process (MMDP) that incorporates multiple views of the state into traditional MDP and proposes a method, View-Consistent Dynamics (VCD), that learns state representations by training a view-consistent dynamics model in the latent space.
Procedural Generalization by Planning with Self-Supervised World Models
TLDR
Overall, this work suggests that building generalizable agents requires moving beyond the single-task, model-free paradigm and towards self-supervised model-based agents that are trained in rich, procedural, multi-task environments.
Pretraining Representations for Data-Efficient Reinforcement Learning
TLDR
This work uses unlabeled data to pretrain an encoder which is then finetuned on a small amount of task-specific data, and employs a combination of latent dynamics modelling and unsupervised goal-conditioned RL to encourage learning representations which capture diverse aspects of the underlying MDP.
Behavior From the Void: Unsupervised Active Pre-Training
TLDR
A new unsupervised pre-training method for reinforcement learning called APT, which stands for Active Pre-Training, which learns behaviors and representations by actively searching for novel states in reward-free environments by maximizing a non-parametric entropy computed in an abstract representation space.
Image Augmentation Based Momentum Memory Intrinsic Reward for Sparse Reward Visual Scenes
TLDR
A novel framework IAMMIR combining the self-supervised representation learning with the intrinsic motivation is presented, and a new type of intrinsic reward is designed, the Momentum Memory Intrinsic Reward ( MMIR).
Behavior From the Void: Unsupervised Active Pre-Training
TLDR
A new unsupervised pre-training method for reinforcement learning called APT, which stands for Active Pre-Training, which learns behaviors and representations by actively searching for novel states in reward-free environments by maximizing a non-parametric entropy computed in an abstract representation space.
Learning Representations for Pixel-based Control: What Matters and Why?
TLDR
This paper presents a simple baseline approach that can learn meaningful representations with no metric-based learning, no data augmentations, no world-model learning, and no contrastive learning and hopes this view can motivate researchers to rethink representation learning when investigating how to best apply RL to real-world tasks.
DATA-EFFICIENT REINFORCEMENT LEARNING
TLDR
This work employs a novel combination of latent dynamics modelling and goal-reaching objectives, which exploit the inherent structure of data in reinforcement learning, and demonstrates that the method scales well with network capacity and pretraining data.
PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for Reinforcement Learning
TLDR
This work proposes a novel method, dubbed PlayVirtual, which augments cycle-consistent virtual trajectories to enhance the data efficiency for RL feature representation learning and achieves the state-of-the-art performance on both benchmarks.
Behavior From the Void: Unsupervised Active Pre-Training
TLDR
A new unsupervised pre-training method for reinforcement learning called APT, which stands for Active Pre-Training, which learns behaviors and representations by actively searching for novel states in reward-free environments by maximizing a non-parametric entropy computed in an abstract representation space.
...
...

References

SHOWING 1-10 OF 54 REFERENCES
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
TLDR
This work introduces Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning that performs on par or better than the current state of the art on both transfer and semi- supervised benchmarks.
DeepMDP: Learning Continuous Latent Space Models for Representation Learning
TLDR
This work introduces the concept of a DeepMDP, a parameterized latent space model that is trained via the minimization of two tractable losses: prediction of rewards and prediction of the distribution over next latent states, and shows that the optimization of these objectives guarantees the quality of the latent space as a representation of the state space.
Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning
Learning a good representation is an essential component for deep reinforcement learning (RL). Representation learning is especially important in multitask and partially observable settings where
Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels
TLDR
The addition of the augmentation method dramatically improves SAC's performance, enabling it to reach state-of-the-art performance on the DeepMind control suite, surpassing model-based methods and recently proposed contrastive learning (CURL).
CURL: Contrastive Unsupervised Representations for Reinforcement Learning
TLDR
CURL extracts high-level features from raw pixels using contrastive learning and performs off-policy control on top of the extracted features and is the first image-based algorithm to nearly match the sample-efficiency of methods that use state-based features.
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
TLDR
The MuZero algorithm is presented, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics.
Model-Based Reinforcement Learning for Atari
TLDR
Simulated Policy Learning (SimPLe), a complete model-based deep RL algorithm based on video prediction models, is described and a comparison of several model architectures is presented, including a novel architecture that yields the best results in the authors' setting.
Representation Learning with Contrastive Predictive Coding
TLDR
This work proposes a universal unsupervised learning approach to extract useful representations from high-dimensional data, which it calls Contrastive Predictive Coding, and demonstrates that the approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments.
Rainbow: Combining Improvements in Deep Reinforcement Learning
TLDR
This paper examines six extensions to the DQN algorithm and empirically studies their combination, showing that the combination provides state-of-the-art performance on the Atari 2600 benchmark, both in terms of data efficiency and final performance.
Dueling Network Architectures for Deep Reinforcement Learning
TLDR
This paper presents a new neural network architecture for model-free reinforcement learning that leads to better policy evaluation in the presence of many similar-valued actions and enables the RL agent to outperform the state-of-the-art on the Atari 2600 domain.
...
...