• Corpus ID: 235694513

Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation

  title={Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation},
  author={Nicklas Hansen and Hao Su and Xiaolong Wang},
While agents trained by Reinforcement Learning (RL) can solve increasingly challenging tasks directly from visual observations, generalizing learned skills to novel environments remains very challenging. Extensive use of data augmentation is a promising technique for improving generalization in RL, but it is often found to decrease sample efficiency and can even lead to divergence. In this paper, we investigate causes of instability when using data augmentation in common off-policy RL… 

A Comprehensive Survey of Data Augmentation in Visual Reinforcement Learning

A principled taxonomy of the existing augmentation techniques used in visual RL and an in-depth discussion on how to better leverage augmented data in di erent scenarios are presented.

Don't Touch What Matters: Task-Aware Lipschitz Data Augmentation for Visual Reinforcement Learning

Task-aware Lipschitz Data Augmentation (TLDA) for visual RL is proposed, which explicitly identifies the task-correlated pixels with large LipsChitz constants, and only augments thetask-irrelevant pixels for stability and improves both sample efficiency and generalization.

Understanding the Mechanism behind Data Augmentation’s Success on Image-based RL

This work investigates why random shifts are useful augmentations for image-based RL and shows that it increases both the shift-equivariance and shift-invariance of the encoder.

On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning

This work proposes Model-Based Cross Task Tra nsfer ( XTRA), a framework for sample-efficient online RL with scalable pretraining and online cross-task netuning of learned world models, and achieves substantial improvements on the Atari100k benchmark over a baseline trained from scratch.

Deep Transformer Q-Networks for Partially Observable Reinforcement Learning

This work proposes Deep Transformer Q-Networks (DTQN), a novel architecture utilizing transformers and self-attention to encode an agent’s history, and demonstrates the transformer can solve partially observable tasks faster and more stably than previous recurrent approaches.

Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels?

It is suggested that no single self-supervised loss or image augmentation method can dominate all environments and that the current framework for joint optimization of SSL and RL is limited.

Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning

TE mporal D isentanglement (TED), a self-supervised auxiliary task that leads to disentangled representations using the sequential nature of RL observations, is introduced, a step toward making RL algorithms more robust for real-world deployment and life-long learning.

Look where you look! Saliency-guided Q-networks for visual RL tasks

SGQN vastly improves the generalization capability of Soft Actor-Critic agents and outperforms existing state-of-the-art methods on the Deepmind Control Generalization benchmark, setting a new reference in terms of training efficiency, generalization gap, and policy interpretability.

Graph Inverse Reinforcement Learning from Diverse Videos

This paper argues that the true potential of third-person IRL lies in increasing the diversity of videos for better scaling, and proposes to perform graph abstraction on the videos followed by temporal matching in the graph space to measure the task progress.

Intrinsically Motivated Self-supervised Learning in Reinforcement Learning

It is formally show that the self-supervised loss can be decomposed as exploration for novel states and robustness improvement from nuisance elimination, and IM-SSR can be effortlessly plugged into any reinforcement learning with self- supervised auxiliary objectives with nearly no additional cost.



Generalization in Reinforcement Learning by Soft Data Augmentation

SOft Data Augmentation (SODA) is proposed, a method that decouples augmentation from policy learning and is found to significantly advance sample efficiency, generalization, and stability in training over state-of-the-art vision-based RL methods.

Reinforcement Learning with Augmented Data

It is shown that augmentations such as random translate, crop, color jitter, patch cutout, random convolutions, and amplitude scale can enable simple RL algorithms to outperform complex state-of-the-art methods across common benchmarks.

Improving Sample Efficiency in Model-Free Reinforcement Learning from Images

A simple approach capable of matching state-of-the-art model-free and model-based algorithms on MuJoCo control tasks and demonstrating robustness to observational noise, surpassing existing approaches in this setting.

Automatic Data Augmentation for Generalization in Deep Reinforcement Learning

This paper compares three approaches for automatically finding an appropriate augmentation and shows that their agent outperforms other baselines specifically designed to improve generalization in RL and learns policies and representations that are more robust to changes in the environment that do not affect the agent.

Asymmetric Actor Critic for Image-Based Robot Learning

This work exploits the full state observability in the simulator to train better policies which take as input only partial observations (RGBD images) and combines this method with domain randomization and shows real robot experiments for several tasks like picking, pushing, and moving a block.

Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels

The addition of the augmentation method dramatically improves SAC's performance, enabling it to reach state-of-the-art performance on the DeepMind control suite, surpassing model-based methods and recently proposed contrastive learning (CURL).

Decoupling Representation Learning from Reinforcement Learning

A new unsupervised learning task, called Augmented Temporal Contrast (ATC), which trains a convolutional encoder to associate pairs of observations separated by a short time difference, under image augmentations and using a contrastive loss.

Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control

It is demonstrated that visual MPC can generalize to never-before-seen objects---both rigid and deformable---and solve a range of user-defined object manipulation tasks using the same model.

Data-Efficient Reinforcement Learning with Self-Predictive Representations

The method, Self-Predictive Representations (SPR), trains an agent to predict its own latent state representations multiple steps into the future using an encoder which is an exponential moving average of the agent’s parameters and a learned transition model.

End-to-End Training of Deep Visuomotor Policies

This paper develops a method that can be used to learn policies that map raw image observations directly to torques at the robot's motors, trained using a partially observed guided policy search method, with supervision provided by a simple trajectory-centric reinforcement learning method.