Natural Adversarial Examples

@article{Hendrycks2021NaturalAE,
  title={Natural Adversarial Examples},
  author={Dan Hendrycks and Kevin Zhao and Steven Basart and Jacob Steinhardt and Dawn Xiaodong Song},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={15257-15266}
}
  • Dan Hendrycks, Kevin Zhao, D. Song
  • Published 16 July 2019
  • Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
We introduce two challenging datasets that reliably cause machine learning model performance to substantially degrade. The datasets are collected with a simple adversarial filtration technique to create datasets with limited spurious cues. Our datasets’ real-world, unmodified examples transfer to various unseen models reliably, demonstrating that computer vision models have shared weaknesses. The first dataset is called IMAGENET-A and is like the ImageNet test set, but it is far more… 
Rethinking Natural Adversarial Examples for Classification Models
TLDR
This work hypothesized that large, cluttered and/or unusual background is an important reason why the images in this dataset are difficult to be classified, and validated the hypothesis by reducing the background influence in ImageNet-A examples with object detection techniques.
Natural Adversarial Objects
TLDR
A new dataset, Natural Adversarial Objects (NAO), is introduced, to evaluate the robustness of object detection models, and it is found that better performance on MSCOCO validation set does not necessarily translate to betterperformance on NAO, suggesting that robustness cannot be simply achieved by training a more accurate model.
Natural Adversarial Objects Anonymous CVPR 2021 submission
  • Computer Science
  • 2020
TLDR
A new dataset, Natural Adversarial Objects (NAO), is introduced, to evaluate the robustness of object detection models, and it is found that better performance on MSCOCO validation set does not necessarily translate to betterperformance on NAO, suggesting that robustness cannot be simply achieved by training a more accurate model.
Adversarial Examples Improve Image Recognition
TLDR
This work proposes AdvProp, an enhanced adversarial training scheme which treats adversarial examples as additional examples, to prevent overfitting, and shows that AdvProp improves a wide range of models on various image recognition tasks and performs better when the models are bigger.
DAmageNet: A Universal Adversarial Dataset
TLDR
The dataset is called DAmageNet, which is the first universal adversarial dataset that beats many models trained in ImageNet and could serve as a benchmark to study and improve robustness of DNNs.
Defense-friendly Images in Adversarial Attacks: Dataset and Metrics for Perturbation Difficulty
TLDR
A class of robust images that are both resilient to attacks and that recover better than random images under adversarial attacks using simple defense techniques is reported for the first time.
Geometric Properties of Adversarial Images
TLDR
This work explores a novel method towards adversarial image detection with linear algebra approach, built on a comparison of distances to the centroids for a given point and its neighbors.
Towards Automated Testing and Robustification by Semantic Adversarial Data Generation
TLDR
Semantic adversarial editing, a method to synthesize plausible but difficult data points on which the authors' target model breaks down, is proposed with a differentiable object synthesizer allowing to modify appearances of an object instance while maintaining its original pose.
How benign is benign overfitting?
TLDR
This work identifies label noise as one of the causes for adversarial vulnerability, and provides theoretical and empirical evidence in support of this and conjecture that in part the need for complex decision boundaries arises from sub-optimal representation learning.
Metrics and methods for robustness evaluation of neural networks with generative models
TLDR
This paper proposes several metrics to measure robustness of classifiers to natural adversarial examples, and methods to evaluate them, and finds that the latent counterparts of adversarial robustness are associated with the accuracy of the classifier rather than its conventional adversarian robustness, but the latter is still reflected on the properties of found latent perturbations.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 92 REFERENCES
Big but Imperceptible Adversarial Perturbations via Semantic Manipulation
TLDR
Two novel methods are proposed, tAdv and cAdv, which leverage texture transfer and colorization to generate natural perturbation with a large $\mathcal{L}_p$ norm, which are general enough to attack both image classification and image captioning tasks on ImageNet and MSCOCO dataset.
Feature Denoising for Improving Adversarial Robustness
TLDR
It is suggested that adversarial perturbations on images lead to noise in the features constructed by these networks, and new network architectures are developed that increase adversarial robustness by performing feature denoising.
Semantic Adversarial Examples
TLDR
This paper introduces a new class of adversarial examples, namely "Semantic Adversarial Examples," as images that are arbitrarily perturbed to fool the model, but in such a way that the modified image semantically represents the same object as the original image.
Fast is better than free: Revisiting adversarial training
TLDR
It is made the surprising discovery that it is possible to train empirically robust models using a much weaker and cheaper adversary, an approach that was previously believed to be ineffective, rendering the method no more costly than standard training in practice.
ConvNets and ImageNet Beyond Accuracy: Understanding Mistakes and Uncovering Biases
TLDR
It is experimentally demonstrated that the accuracy and robustness of ConvNets measured on Imagenet are vastly underestimated and that explanations can mitigate the impact of misclassified adversarial examples from the perspective of the end-user.
Self-Attention Generative Adversarial Networks
TLDR
The proposed SAGAN achieves the state-of-the-art results, boosting the best published Inception score from 36.8 to 52.52 and reducing Frechet Inception distance from 27.62 to 18.65 on the challenging ImageNet dataset.
Constructing Unrestricted Adversarial Examples with Generative Models
TLDR
The empirical results on the MNIST, SVHN, and CelebA datasets show that unrestricted adversarial examples can bypass strong adversarial training and certified defense methods designed for traditional adversarial attacks.
A Fourier Perspective on Model Robustness in Computer Vision
TLDR
AutoAugment, a recently proposed data augmentation policy optimized for clean accuracy, achieves state-of-the-art robustness on the CIFAR-10-C benchmark and is observed to use a more diverse set of augmentations than previously observed.
Adversarial Machine Learning at Scale
TLDR
This research applies adversarial training to ImageNet and finds that single-step attacks are the best for mounting black-box attacks, and resolution of a "label leaking" effect that causes adversarially trained models to perform better on adversarial examples than on clean examples.
Adversarial Transformation Networks: Learning to Generate Adversarial Examples
TLDR
This work efficiently train feed-forward neural networks in a self-supervised manner to generate adversarial examples against a target network or set of networks, and calls such a network an Adversarial Transformation Network (ATN).
...
1
2
3
4
5
...