• Corpus ID: 238583248

Learning Temporally-Consistent Representations for Data-Efficient Reinforcement Learning

@article{McInroe2021LearningTR,
  title={Learning Temporally-Consistent Representations for Data-Efficient Reinforcement Learning},
  author={Trevor McInroe and Lukas Sch{\"a}fer and Stefano V. Albrecht},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.04935}
}
Deep reinforcement learning (RL) agents that exist in high-dimensional state spaces, such as those composed of images, have interconnected learning burdens. Agents must learn an action-selection policy that completes their given task, which requires them to learn a representation of the state space that discerns between useful and useless information. The reward function is the only supervised feedback that RL agents receive, which causes a representation learning bottleneck that can manifest… 

Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning

A novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making, and develops two implementations of the above idea for the discrete and continuous action spaces respectively.

Mix-up Consistent Cross Representations for Data-Efficient Reinforcement Learning

This paper proposes Mix-up Consistent Cross Representations (MCCR), a novel self-supervised auxiliary task, which aims to improve data efficiency and encourage representation prediction, and calculates the contrastive loss between low-dimensional and high-dimensional representations of different state observations to boost the mutual information between states, thus improving data efficiency.

Learning Representations for Control with Hierarchical Forward Models

Hierarchical k -Step Latent (HKSL) is proposed, an auxiliary task that learns representations via a hierarchy of forward models that operate at varying magnitudes of step skipping while also learning to communicate between levels in the hierarchy.

Deep Reinforcement Learning for Multi-Agent Interaction

A broad overview of the ongoing research portfolio of the Autonomous Agents Research Group is provided and open problems for future directions are discussed.

References

SHOWING 1-10 OF 42 REFERENCES

Data-Efficient Reinforcement Learning with Self-Predictive Representations

The method, Self-Predictive Representations (SPR), trains an agent to predict its own latent state representations multiple steps into the future using an encoder which is an exponential moving average of the agent’s parameters and a learned transition model.

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model

The stochastic latent actor-critic (SLAC) algorithm is proposed: a sample-efficient and high-performing RL algorithm for learning policies for complex continuous control tasks directly from high-dimensional image inputs.

Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning

Learning a good representation is an essential component for deep reinforcement learning (RL). Representation learning is especially important in multitask and partially observable settings where

Reinforcement Learning with Prototypical Representations

Proto-RL is a self-supervised framework that ties representation learning with exploration through prototypical representations that serve as a summarization of the exploratory experience of an agent as well as a basis for representing observations.

DeepMDP: Learning Continuous Latent Space Models for Representation Learning

This work introduces the concept of a DeepMDP, a parameterized latent space model that is trained via the minimization of two tractable losses: prediction of rewards and prediction of the distribution over next latent states, and shows that the optimization of these objectives guarantees the quality of the latent space as a representation of the state space.

Decoupling Representation Learning from Reinforcement Learning

A new unsupervised learning task, called Augmented Temporal Contrast (ATC), which trains a convolutional encoder to associate pairs of observations separated by a short time difference, under image augmentations and using a contrastive loss.

Unsupervised State Representation Learning in Atari

This work introduces a method that learns state representations by maximizing mutual information across spatially and temporally distinct features of a neural encoder of the observations and introduces a new benchmark based on Atari 2600 games to evaluate representations based on how well they capture the ground truth state variables.

Improving Sample Efficiency in Model-Free Reinforcement Learning from Images

A simple approach capable of matching state-of-the-art model-free and model-based algorithms on MuJoCo control tasks and demonstrating robustness to observational noise, surpassing existing approaches in this setting.

Learning Latent Dynamics for Planning from Pixels

The Deep Planning Network (PlaNet) is proposed, a purely model-based agent that learns the environment dynamics from images and chooses actions through fast online planning in latent space using a latent dynamics model with both deterministic and stochastic transition components.

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning

A theoretically motivated policy similarity metric (PSM) for measuring behavioral similarity between states is introduced and it is demonstrated that PSEs improve generalization on diverse benchmarks, including LQR with spurious correlations, a jumping task from pixels, and Distracting DM Control Suite.