Corpus ID: 235458133

SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

@inproceedings{Fan2021SECANTSC,
  title={SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies},
  author={Linxi Fan and Guanzhi Wang and De-An Huang and Zhiding Yu and Li Fei-Fei and Yuke Zhu and Anima Anandkumar},
  booktitle={ICML},
  year={2021}
}
Generalization has been a long-standing challenge for reinforcement learning (RL). Visual RL, in particular, can be easily distracted by irrelevant factors in high-dimensional observation space. In this work, we consider robust policy learning which targets zero-shot generalization to unseen visual environments with large distributional shift. We propose SECANT, a novel self-expert cloning technique that leverages image augmentation in two stages to decouple robust representation learning from… Expand

References

SHOWING 1-10 OF 74 REFERENCES
Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation
TLDR
It is shown that by separating the visual transfer task from the control policy the authors achieve substantially better sample efficiency and transfer behavior, allowing an agent trained on the source task to transfer well to the target tasks. Expand
Decoupling Representation Learning from Reinforcement Learning
TLDR
A new unsupervised learning task, called Augmented Temporal Contrast (ATC), which trains a convolutional encoder to associate pairs of observations separated by a short time difference, under image augmentations and using a contrastive loss. Expand
Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck
TLDR
This work proposes Selective Noise Injection (SNI), which maintains the regularizing effect the injected noise has, while mitigating the adverse effects it has on the gradient quality, and demonstrates that the Information Bottleneck is a particularly well suited regularization technique for RL as it is effective in the low-data regime encountered early on in training RL agents. Expand
Asymmetric Actor Critic for Image-Based Robot Learning
TLDR
This work exploits the full state observability in the simulator to train better policies which take as input only partial observations (RGBD images) and combines this method with domain randomization and shows real robot experiments for several tasks like picking, pushing, and moving a block. Expand
Improving Generalization in Reinforcement Learning with Mixture Regularization
TLDR
This work introduces a simple approach, named mixreg, which trains agents on a mixture of observations from different training environments and imposes linearity constraints on the observation interpolations and the supervision and increases the data diversity more effectively and helps learn smoother policies. Expand
Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning
TLDR
A simple technique to improve a generalization ability of deep RL agents by introducing a randomized (convolutional) neural network that randomly perturbs input observations, which enables trained agents to adapt to new domains by learning robust features invariant across varied and randomized environments. Expand
RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Real
TLDR
The RL-CycleGAN, a new approach for simulation-to-real-world transfer for reinforcement learning, is obtained by incorporating the RL-scene consistency loss into unsupervised domain translation, which ensures that the translation operation is invariant with respect to the Q-values associated with the image. Expand
Quantifying Generalization in Reinforcement Learning
TLDR
It is shown that deeper convolutional architectures improve generalization, as do methods traditionally found in supervised learning, including L2 regularization, dropout, data augmentation and batch normalization. Expand
Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation
TLDR
It is shown that for some games procedural level generation enables generalization to new levels within the same distribution and it is possible to achieve better performance with less data by manipulating the difficulty of the levels in response to the performance of the agent. Expand
End-to-End Training of Deep Visuomotor Policies
TLDR
This paper develops a method that can be used to learn policies that map raw image observations directly to torques at the robot's motors, trained using a partially observed guided policy search method, with supervision provided by a simple trajectory-centric reinforcement learning method. Expand
...
1
2
3
4
5
...