Temporal Abstractions-Augmented Temporally Contrastive Learning: An Alternative to the Laplacian in RL

  title={Temporal Abstractions-Augmented Temporally Contrastive Learning: An Alternative to the Laplacian in RL},
  author={Akram Erraqabi and Marlos C. Machado and Mingde Zhao and Sainbayar Sukhbaatar and Alessandro Lazaric and Ludovic Denoyer and Yoshua Bengio},
In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting, with applications ranging from skill discovery to reward shaping. Recently, learning the Laplacian representation has been framed as the optimization of a temporally-contrastive objective to overcome its computational limitations in large (or continuous) state spaces. However, this approach requires uniform access to all states in the state space, overlooking the exploration problem… 

Reachability-Aware Laplacian Representation in Reinforcement Learning

A Reachability-Aware Laplacian Representation ( RA-LapRep) is introduced, which can better capture the inter-state reachability as compared to LapRep, through both theoretical explanations and experimental results.

Deep Laplacian-based Options for Temporally-Extended Exploration

This paper introduces a fully online deep RL algorithm for discovering Laplacian-based options and compares to several state-of-the-art exploration methods and shows that the approach is effective, general, and especially promising in non-stationary settings.

Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks

Through a series of experiments on the Arcade Learning Environment, it is demonstrated that proto-value networks produce rich features that may be used to obtain performance comparable to established algorithms, using only linear approximation and a small number of interactions with the environment’s reward function.

Temporal Abstraction in Reinforcement Learning with the Successor Representation

This paper argues that the successor representation, which encodes states based on the pattern of state visitation that follows them, can be seen as a natural substrate for the discovery and use of temporal abstractions and takes a big picture view of recent results, showing how it can be used to discover options that facilitate either temporally-extended exploration or planning.


  • Computer Science
  • 2022
In deep reinforcement learning (RL), an agent maps observations to a policy or return prediction by means of a neural network to transform observations into a series of successively refined features, which are linearly combined by the final layer into the desired prediction.



Eigenoption Discovery through the Deep Successor Representation

This paper proposes an algorithm that discovers eigenoptions while learning non-linear state representations from raw pixels, and exploits recent successes in the deep reinforcement learning literature and the equivalence between proto-value functions and the successor representation.

Efficient Exploration in Reinforcement Learning through Time-Based Representations

This dissertation advocate that agents’ exploration strategy can be guided by the process of representation learning, and supports this claim by introducing different exploration approaches for RL algorithms that are applicable to complex environments with sparse rewards.

Reinforcement Learning with Prototypical Representations

Proto-RL is a self-supervised framework that ties representation learning with exploration through prototypical representations that serve as a summarization of the exploratory experience of an agent as well as a basis for representing observations.

The Laplacian in RL: Learning Representations with Efficient Approximations

This paper presents a fully general and scalable method for approximating the eigenvectors of the Laplacian in a model-free RL context, and empirically shows that it generalizes beyond the tabular, finite-state setting.

Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning

The algorithm provably explores the environment with sample complexity scaling polynomially in the number of latent states and the time horizon, and with no dependence on the size of the observation space, which could be infinitely large, which enables sample-efficient global policy optimization for any reward function.

Decoupling Representation Learning from Reinforcement Learning

A new unsupervised learning task, called Augmented Temporal Contrast (ATC), which trains a convolutional encoder to associate pairs of observations separated by a short time difference, under image augmentations and using a contrastive loss.

Learning Latent Dynamics for Planning from Pixels

The Deep Planning Network (PlaNet) is proposed, a purely model-based agent that learns the environment dynamics from images and chooses actions through fast online planning in latent space using a latent dynamics model with both deterministic and stochastic transition components.

Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal Reaching

This paper builds on the mutual information framework for skill discovery and introduces UPSIDE, which addresses the coverage-directedness trade-off in the following ways: design policies with a decoupled structure of a directed skill, trained to reach a specific region, followed by a diffusing part that induces a local coverage.

Fast Task Inference with Variational Intrinsic Successor Features

This paper introduces Variational Intrinsic Successor FeatuRes (VISR), a novel algorithm which learns controllable features that can be leveraged to provide enhanced generalization and fast task inference through the successor feature framework.