• Corpus ID: 244714075

Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions

  title={Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions},
  author={Bogdan Mazoure and Ilya Kostrikov and Ofir Nachum and Jonathan Tompson},
Reinforcement learning (RL) agents are widely used for solving complex sequen-tial decision making tasks, but still exhibit difficulty in generalizing to scenarios not seen during training. While prior online approaches demonstrated that using additional signals beyond the reward function can lead to better generalization capabilities in RL agents, i.e. using self-supervised learning (SSL), they struggle in the offline RL setting, i.e. learning from a static dataset. We show that performance of… 

Figures and Tables from this paper

Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories

A simple meta-algorithmic pipeline is developed that learns an inverse-dynamics model on the labelled data to obtainproxy-labels for the unlabelled data, followed by the use of any offline RL algorithm on the true and proxy-labelled trajectories.

Domain Generalization: A Survey

A comprehensive literature review in DG is provided to summarize the developments over the past decade and cover the background by formally defining DG and relating it to other relevant fields like domain adaptation and transfer learning.



Cross-Trajectory Representation Learning for Zero-Shot Generalization in RL

This work proposes Cross Trajectory Representation Learning (CTRL), a method that runs within an RL agent and conditions its encoder to recognize behavioral similarity in observations by applying a novel SSL objective to pairs of trajectories from the agent’s policies.

DARLA: Improving Zero-Shot Transfer in Reinforcement Learning

A new multi-stage RL agent, DARLA (DisentAngled Representation Learning Agent), which learns to see before learning to act, which significantly outperforms conventional baselines in zero-shot domain adaptation scenarios.

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning

A theoretically motivated policy similarity metric (PSM) for measuring behavioral similarity between states is introduced and it is demonstrated that PSEs improve generalization on diverse benchmarks, including LQR with spurious correlations, a jumping task from pixels, and Distracting DM Control Suite.

S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement Learning

This paper studies the effectiveness of performing data augmentations on the state space, and study 7 different augmentation schemes and how they behave with existing offline RL algorithms, and combines the best data performing augmentation scheme with a state-of-the-art Q-learning technique.

Decoupling Representation Learning from Reinforcement Learning

A new unsupervised learning task, called Augmented Temporal Contrast (ATC), which trains a convolutional encoder to associate pairs of observations separated by a short time difference, under image augmentations and using a contrastive loss.

Data-Efficient Reinforcement Learning with Self-Predictive Representations

The method, Self-Predictive Representations (SPR), trains an agent to predict its own latent state representations multiple steps into the future using an encoder which is an exponential moving average of the agent’s parameters and a learned transition model.

Improving Sample Efficiency in Model-Free Reinforcement Learning from Images

A simple approach capable of matching state-of-the-art model-free and model-based algorithms on MuJoCo control tasks and demonstrating robustness to observational noise, surpassing existing approaches in this setting.

Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment

This paper augments a learned dynamics model with simple transformations that seek to capture potential changes in physical properties of the robot, leading to more robust policies.

Provable Representation Learning for Imitation with Contrastive Fourier Features

This work considers using offline experience datasets – potentially far from the target distribution – to learn low-dimensional state representations that provably accelerate the sample-efficiency of downstream imitation learning.

Conservative Q-Learning for Offline Reinforcement Learning

Conservative Q-learning (CQL) is proposed, which aims to address limitations of offline RL methods by learning a conservative Q-function such that the expected value of a policy under this Q- function lower-bounds its true value.