The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

  title={The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization},
  author={Dan Hendrycks and Steven Basart and Norman Mu and Saurav Kadavath and Frank Wang and Evan Dorundo and Rahul Desai and Tyler Lixuan Zhu and Samyak Parajuli and Mike Guo and Dawn Xiaodong Song and Jacob Steinhardt and Justin Gilmer},
  journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)},
We introduce four new real-world distribution shift datasets consisting of changes in image style, image blurriness, geographic location, camera operation, and more. With our new datasets, we take stock of previously proposed methods for improving out-of-distribution robustness and put them to the test. We find that using larger models and artificial data augmentations can improve robustness on real-world distribution shifts, contrary to claims in prior work. We find improvements in artificial… 
Visual correspondence-based explanations improve AI robustness and human-AI team accuracy
This work proposes two novel architectures of self-interpretable image classifiers that first explain, and then predict by harnessing the visual correspondences between a query image and exemplars, and shows that it is possible to achieve complementary human-AI team accuracy higher than either AI-alone or human-alone, in ImageNet and CUB image classification tasks.
Efficient Test-Time Model Adaptation without Forgetting
An active sample selection criterion is proposed to identify reliable and non-redundant samples, on which the model is updated to minimize the entropy loss for test-time adaptation, and a Fisher regularizer is introduced to constrain important model parameters from drastic changes.
Classifiers trained using the proposed AdversarialAugment method in conjunction with prior methods improve upon the state-of-the-art on common image corruption benchmarks conducted in expectation on CIFAR-10-C and also improve worst-case performance against `p-norm bounded perturbations on both CIFar-10 and IMAGENET.
Vision Transformers are Robust Learners
This work uses six different diverse ImageNet datasets concerning robust classification to conduct a comprehensive performance comparison of ViT models and SOTA convolutional neural networks (CNNs), Big-Transfer (Kolesnikov et al. 2020) and presents analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
Discrete Representations Strengthen Vision Transformer Robustness
Experimental results demonstrate that adding discrete representation on four architecture variants strengthens ViT robustness by up to 12% across seven ImageNet robustness benchmarks while maintaining the performance on ImageNet.
Combined Scaling for Zero-shot Transfer Learning
We present a combined scaling method called BASIC that achieves 85.7% top-1 zero-shot accuracy on the ImageNet ILSVRC-2012 validation set, surpassing the best-published zero-shot models – CLIP and
Using Synthetic Corruptions to Measure Robustness to Natural Distribution Shifts
This paper proposes a methodology to build synthetic corruption benchmarks that make robustness estimations more correlated with robustness to real-world distribution shifts.
Deep Image Comparator: Learning to Visualize Editorial Change
A robust near-duplicate search for matching a potentially manipulated image circulating online to an image within a trusted database of originals is described and effective retrieval and comparison of benign transformed and manipulated images are demonstrated, over a dataset of millions of photographs.
Language Guided Out-of-Distribution Detection
Language-Guided Out-of-Distribution Detection
Towards Robust Vision Transformer
Robust Vision Transformer (RVT) is proposed, which is a new vision transformer and has superior performance with strong robustness and generalization ability compared with previous ViTs and state-of-the-art CNNs.


Do ImageNet Classifiers Generalize to ImageNet?
The results suggest that the accuracy drops are not caused by adaptivity, but by the models' inability to generalize to slightly "harder" images than those found in the original test sets.
ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness
It is shown that ImageNet-trained CNNs are strongly biased towards recognising textures rather than shapes, which is in stark contrast to human behavioural evidence and reveals fundamentally different classification strategies.
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
This paper standardizes and expands the corruption robustness topic, while showing which classifiers are preferable in safety-critical applications, and proposes a new dataset called ImageNet-P which enables researchers to benchmark a classifier's robustness to common perturbations.
DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images
A strong baseline is proposed, called Match R- CNN, which builds upon Mask R-CNN to solve the above four tasks in an end-to-end manner to address issues of DeepFashion2.
Enhanced Deep Residual Networks for Single Image Super-Resolution
This paper develops an enhanced deep super-resolution network (EDSR) with performance exceeding those of current state-of-the-art SR methods, and proposes a new multi-scale deepsuper-resolution system (MDSR) and training method, which can reconstruct high-resolution images of different upscaling factors in a single model.
Lossy Image Compression with Compressive Autoencoders
It is shown that minimal changes to the loss are sufficient to train deep autoencoders competitive with JPEG 2000 and outperforming recently proposed approaches based on RNNs, and furthermore computationally efficient thanks to a sub-pixel architecture, which makes it suitable for high-resolution images.
A Simple Way to Make Neural Networks Robust Against Diverse Image Corruptions
It is demonstrated that a simple but properly tuned training with additive Gaussian and Speckle noise generalizes surprisingly well to unseen corruptions, easily reaching the previous state of the art on the corruption benchmark ImageNet-C (with ResNet50) and on MNIST-C.
Identifying Statistical Bias in Dataset Replication
This work studies ImageNet-v2, a replication of the ImageNet dataset on which models exhibit a significant drop in accuracy, even after controlling for a standard human-in-the-loop measure of data quality.
Shortcut Learning in Deep Neural Networks
A set of recommendations for model interpretation and benchmarking is developed, highlighting recent advances in machine learning to improve robustness and transferability from the lab to real-world applications.