• Corpus ID: 230770431

Reinforcement Learning with Latent Flow

  title={Reinforcement Learning with Latent Flow},
  author={Wenling Shang and Xiaofei Wang and A. Srinivas and Aravind Rajeswaran and Yang Gao and P. Abbeel and Michael Laskin},
Temporal information is essential to learning effective policies with Reinforcement Learning (RL). However, current state-of-the-art RL algorithms either assume that such information is given as part of the state space or, when learning from pixels, use the simple heuristic of frame-stacking to implicitly capture temporal information present in the image observations. This heuristic is in contrast to the current paradigm in video classification architectures, which utilize explicit encodings of… 

Figures and Tables from this paper

Temporal shift reinforcement learning

It is shown that TSRL outperforms the commonly used frame stacking heuristic on all of the Atari environments the authors test on while beating the SOTA for all except one of them.

Temporal Aware Deep Reinforcement Learning

The function approximators employed by traditional image based Deep Reinforcement Learning (DRL) algorithms usually lack a temporal learning component and instead focus on learning the spatial

Weakly Supervised Scene Text Detection using Deep Reinforcement Learning

A weak supervision method for scene text detection which makes use of reinforcement learning (RL), where the reward received by the RL agent is estimated by a neural network, instead of being inferred from ground-truth labels.

The Unsurprising Effectiveness of Pre-Trained Vision Models for Control

It is found that pre-trained visual representations can be competitive or even better than ground-truth state representations to train control policies, in spite of using only out-of-domain data from standard vision datasets, without any in- domain data from the deployment environments.

Hierarchical Active Tracking Control for UAVs via Deep Reinforcement Learning

This work unify the perception and decision-making stages using a high-level controller and then leverage deep reinforcement learning to learn the mapping from raw images to the high- level action commands in the V-REP-based environment.

A template for the arxiv style

Recent interest in structure solution and refinement using electron diffraction (ED) has been fuelled by its inherent advantages when applied to crystals of sub-micron size, as well as a better

VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning

On a set of highly challenging hand manipulation tasks with sparse reward and realistic visual inputs, this framework learns 370%-1200% faster than the previous SOTA method while using an encoder that is 50 times smaller, fully demonstrating the potential of data-driven deep reinforcement learning.



Improving Sample Efficiency in Model-Free Reinforcement Learning from Images

A simple approach capable of matching state-of-the-art model-free and model-based algorithms on MuJoCo control tasks and demonstrating robustness to observational noise, surpassing existing approaches in this setting.

Decoupling Representation Learning from Reinforcement Learning

A new unsupervised learning task, called Augmented Temporal Contrast (ATC), which trains a convolutional encoder to associate pairs of observations separated by a short time difference, under image augmentations and using a contrastive loss.

Reinforcement Learning with Augmented Data

It is shown that augmentations such as random translate, crop, color jitter, patch cutout, random convolutions, and amplitude scale can enable simple RL algorithms to outperform complex state-of-the-art methods across common benchmarks.

Data-Efficient Reinforcement Learning with Momentum Predictive Representations

This work trains an agent to predict its own latent state representations multiple steps into the future using an encoder which is an exponential moving average of the agent's parameters, and makes predictions using a learned transition model.

Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels

The addition of the augmentation method dramatically improves SAC's performance, enabling it to reach state-of-the-art performance on the DeepMind control suite, surpassing model-based methods and recently proposed contrastive learning (CURL).

Learning Latent Dynamics for Planning from Pixels

The Deep Planning Network (PlaNet) is proposed, a purely model-based agent that learns the environment dynamics from images and chooses actions through fast online planning in latent space using a latent dynamics model with both deterministic and stochastic transition components.

Dueling Network Architectures for Deep Reinforcement Learning

This paper presents a new neural network architecture for model-free reinforcement learning that leads to better policy evaluation in the presence of many similar-valued actions and enables the RL agent to outperform the state-of-the-art on the Atari 2600 domain.

Reinforcement Learning with Unsupervised Auxiliary Tasks

This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.

Motion Perception in Reinforcement Learning with Dynamic Objects

It is shown that for continuous control tasks learning an explicit representation of motion improves the quality of the learned controller in dynamic scenarios, and that using an image difference between the current and the previous frame as an additional input leads to better results than a temporal stack of frames.

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods.