• Corpus ID: 236134152

Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning

  title={Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning},
  author={Denis Yarats and Rob Fergus and Alessandro Lazaric and Lerrel Pinto},
We present DrQ-v2, a model-free reinforcement learning (RL) algorithm for visual continuous control. DrQ-v2 builds on DrQ, an off-policy actor-critic approach that uses data augmentation to learn directly from pixels. We introduce several improvements that yield state-of-the-art results on the DeepMind Control Suite. Notably, DrQ-v2 is able to solve complex humanoid locomotion tasks directly from pixel observations, previously unattained by model-free RL. DrQ-v2 is conceptually simple, easy to… 

Figures and Tables from this paper

VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning

On a set of challenging hand manipulation tasks with sparse reward and realistic visual inputs, compared to the previous SOTA, VRL3 achieves an average of 780% better sample ef-speciency and solves the task with only 10% of the computation, demonstrating the great potential of data-driven deep reinforcement learning.

Sim-to-real via latent prediction: Transferring visual non-prehensile manipulation policies

This work proposes a sim-to-real technique that trains a Soft-Actor Critic agent together with a decoupled feature extractor and a latent-space dynamics model, and shows how this architecture can allow the transfer of a trained agent from simulation to reality without retraining or finetuning the control policy, but using real-world data only for adapting the feature Extractor.

PIRLNav: Pretraining with Imitation and RL Finetuning for ObjectNav

This work presents a two-stage learning scheme for IL pretraining on human demonstrations followed by RL-finetuning, and investigates whether human demonstrations can be replaced with ‘free’ sources of demonstrations, e.g .

Tackling Visual Control via Multi-View Exploration Maximization

MEM can significantly promote the sample-e-ciency and generalization ability of the RL agent, facilitating solving real-world problems with high-dimensional observations and spare-reward space and outperform the benchmarking schemes with simple architecture and higher efficiency.

Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning

Surprisingly, it is found that the early layers in an ImageNet pre-trained ResNet model could provide rather generalizable representations for visual RL, and this paper proposes PIE-G, a simple yet effective framework that can generalize to the unseen visual scenarios in a zero-shot manner.

StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning

The proposed StARformer outperforms the state-of-the-art Transformer-based method on image-based Atari and DeepMind Control Suite benchmarks, in both offline-RL and imitation learning settings, and is also more compliant with longer sequences of inputs.

Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations

Simple modifications to two state-of-the-art vision-based online reinforcement learning algorithms, DreamerV2 and DrQ-v2, are shown to outperform prior work and establish a competitive baseline, and several key desiderata unique to offline RL from visual observations are presented.

Watch and Match: Supercharging Imitation with Regularized Optimal Transport

This work presents Regularized Optimal Transport (ROT), a new imitation learning algorithm that builds on recent advances in optimal transport based state- matching and demonstrates that adaptively combining state-matching rewards with behavior cloning can significantly accelerate imitation even without task-specific rewards.

Unsupervised Model-based Pre-training for Data-efficient Control from Pixels

This work designs an effective unsupervised RL strategy for data-efficient visual control and demonstrates robust performance on the Real-Word RL benchmark, hinting that the approach generalizes to noisy environments.

A Comprehensive Survey of Data Augmentation in Visual Reinforcement Learning

A principled taxonomy of the existing augmentation techniques used in visual RL and an in-depth discussion on how to better leverage augmented data in di erent scenarios are presented.



Continuous control with deep reinforcement learning

This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

Improving Sample Efficiency in Model-Free Reinforcement Learning from Images

A simple approach capable of matching state-of-the-art model-free and model-based algorithms on MuJoCo control tasks and demonstrating robustness to observational noise, surpassing existing approaches in this setting.

Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation

Compared to other approaches for incorporating invariances, such as domain randomization and learning-from-scratch, asynchronously trained mid-level representations scale better: both to harder problems and to larger domain shifts.

Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels

The addition of the augmentation method dramatically improves SAC's performance, enabling it to reach state-of-the-art performance on the DeepMind control suite, surpassing model-based methods and recently proposed contrastive learning (CURL).

Data-Efficient Reinforcement Learning with Momentum Predictive Representations

This work trains an agent to predict its own latent state representations multiple steps into the future using an encoder which is an exponential moving average of the agent's parameters, and makes predictions using a learned transition model.

Learning Visual Feature Spaces for Robotic Manipulation with Deep Spatial Autoencoders

An approach that automates state-space construction by learning a state representation directly from camera images by using a deep spatial autoencoder to acquire a set of feature points that describe the environment for the current task, such as the positions of objects.

Generalization in Reinforcement Learning by Soft Data Augmentation

SOft Data Augmentation (SODA) is proposed, a method that decouples augmentation from policy learning and is found to significantly advance sample efficiency, generalization, and stability in training over state-of-the-art vision-based RL methods.

Decoupling Representation Learning from Reinforcement Learning

A new unsupervised learning task, called Augmented Temporal Contrast (ATC), which trains a convolutional encoder to associate pairs of observations separated by a short time difference, under image augmentations and using a contrastive loss.

Automatic Data Augmentation for Generalization in Deep Reinforcement Learning

This paper compares three approaches for automatically finding an appropriate augmentation and shows that their agent outperforms other baselines specifically designed to improve generalization in RL and learns policies and representations that are more robust to changes in the environment that do not affect the agent.

Soft Actor-Critic Algorithms and Applications

Soft Actor-Critic (SAC), the recently introduced off-policy actor-critic algorithm based on the maximum entropy RL framework, achieves state-of-the-art performance, outperforming prior on-policy and off- policy methods in sample-efficiency and asymptotic performance.