Look where you look! Saliency-guided Q-networks for visual RL tasks

  title={Look where you look! Saliency-guided Q-networks for visual RL tasks},
  author={David Bertoin and Adil Zouitine and Mehdi Zouitine and Emmanuel Rachelson},
Deep reinforcement learning policies, despite their outstanding efficiency in simulated visual control tasks, have shown disappointing ability to generalize across disturbances in the input training images. Changes in image statistics or distracting background elements are pitfalls that prevent generalization and real-world applicability of such control policies. We elaborate on the intuition that a good visual policy should be able to identify which pixels are important for its decision, and… 



Don't Touch What Matters: Task-Aware Lipschitz Data Augmentation for Visual Reinforcement Learning

Task-aware Lipschitz Data Augmentation (TLDA) for visual RL is proposed, which explicitly identifies the task-correlated pixels with large LipsChitz constants, and only augments thetask-irrelevant pixels for stability and improves both sample efficiency and generalization.

SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

This work considers robust policy learning which targets zero-shot generalization to unseen visual environments with large distributional shift and proposes SECANT, a novel self-expert cloning technique that leverages image augmentation in two stages to decouple robust representation learning from policy optimization.

Robust Deep Reinforcement Learning via Multi-View Information Bottleneck

The results show that KL balancing can improve training of RSSM with a contrastive-learning-based or mutual-information-maximization objective, and the approach outperforms well-established baselines for generalization to unseen environments on the Procgen benchmark.

Unsupervised Visual Attention and Invariance for Reinforcement Learning

The results demonstrate that it is not only possible to learn domain-invariant vision without any supervision, but freeing RL from visual distractions also makes the policy more focused and thus far better.

Are Gradient-based Saliency Maps Useful in Deep Reinforcement Learning?

This work brings some of the best-known visualization methods from the field of image classification to the area of Deep Reinforcement Learning, and it is shown that this problem in DRL can be recognized with the help of gradient visualization techniques.

Measuring Visual Generalization in Continuous Control from Pixels

The empirical analysis shows that current methods struggle to generalize across a diverse set of visual changes, and it is found that data augmentation techniques outperform self-supervised learning approaches and that more significant image transformations provide better visual generalization.

Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation

This paper investigates causes of instability when using data augmentation in common off-policy RL algorithms and proposes a simple yet effective technique for stabilizing this class of algorithms under augmentation, and achieves generalization results competitive with state-of-the-art methods for image-based RL in environments with unseen visuals.

Local Feature Swapping for Generalization in Reinforcement Learning

It is demonstrated, on the OpenAI Procgen Benchmark, that RL agents trained with the CLOP method exhibit robustness to visual changes and better generalization properties than agents trained using other state-of-the-art regularization techniques.

Automatic Data Augmentation for Generalization in Reinforcement Learning

This paper introduces three approaches for automatically finding an effective augmentation for any RL task, combined with two novel regularization terms for the policy and value function, required to make the use of data augmentation theoretically sound for actor-critic algorithms.

The Distracting Control Suite - A Challenging Benchmark for Reinforcement Learning from Pixels

The experiments show that current RL methods for vision-based control perform poorly under distractions, and that their performance decreases with increasing distraction complexity, showing that new methods are needed to cope with the visual complexities of the real world.