• Corpus ID: 44220250

Adversarial Examples that Fool both Computer Vision and Time-Limited Humans

@inproceedings{Elsayed2018AdversarialET,
  title={Adversarial Examples that Fool both Computer Vision and Time-Limited Humans},
  author={Gamaleldin F. Elsayed and Shreya Shankar and Brian Cheung and Nicolas Papernot and Alexey Kurakin and Ian J. Goodfellow and Jascha Sohl-Dickstein},
  booktitle={NeurIPS},
  year={2018}
}
Machine learning models are vulnerable to adversarial examples: small changes to images can cause computer vision models to make mistakes such as identifying a school bus as an ostrich. However, it is still an open question whether humans are prone to similar mistakes. Here, we address this question by leveraging recent techniques that transfer adversarial examples from computer vision models with known parameters and architecture to other models with unknown parameters and architecture, and by… 

Figures from this paper

Adversarial Examples on Object Recognition
TLDR
The hypotheses behind their existence, the methods used to construct or protect against them, and the capacity to transfer adversarial examples between different machine learning models are introduced.
What do adversarial images tell us about human vision?
TLDR
It is shown that agreement between humans and DCNNs is much weaker and more variable than previously reported, and that the weak agreement is contingent on the choice of adversarial images and the design of the experiment.
Adversarial Examples on Object Recognition: A Comprehensive Survey
TLDR
The hypotheses behind their existence, the methods used to construct or protect against them, and the capacity to transfer adversarial examples between different machine learning models are introduced to provide a comprehensive and self-contained survey of this growing field of research.
Humans cannot decipher adversarial images: Revisiting Zhou and Firestone (2019)
TLDR
Two experiments are reported that show that the level of agreement between human and DCNN classification is driven by how the experimenter chooses the adversarial images and how they choose the labels given to humans for classification.
Adversarial images for the primate brain
TLDR
This work designed adversarial images to fool primate vision and modified images to match their model-predicted neuronal responses to a target category, such as monkey faces, showing that a model of neuronal activity can selectively direct primate visual behavior.
Explaining Classifiers using Adversarial Perturbations on the Perceptual Ball
TLDR
This work presents a simple regularization of adversarial perturbations based upon the perceptual loss demonstrating that perceptually regularized counterfactuals are an effective explanation for image-based classifiers.
Attack Type Agnostic Perceptual Enhancement of Adversarial Images
TLDR
The proposed method is attack type agnostic and could be used in association with the existing attacks in the literature and show that the generated adversarial images have lower Euclidean distance values while maintaining the same adversarial attack performance.
Adversarial attacks hidden in plain sight
TLDR
A technique is composed that allows to hide adversarial attacks in regions of high complexity, such that they are imperceptible even to an astute observer with regards to human visual perception.
Perturbations on the Perceptual Ball
TLDR
A simple regularisation of Adversarial Perturbations based upon the perceptual loss is presented, which reinforces the connection between explainable AI and adversarial perturbations.
Bio-inspired Robustness: A Review
TLDR
A set of criteria for proper evaluaIon of DCNNs is proposed and different models according to these criteria are analyzed, to make DCCNs one step closer to the model of human vision.
...
...

References

SHOWING 1-10 OF 49 REFERENCES
Adversarial Examples that Fool both Human and Computer Vision
TLDR
It is found that adversarial examples that strongly transfer across computer vision models influence the classifications made by time-limited human observers.
Synthesizing Robust Adversarial Examples
TLDR
The existence of robust 3D adversarial objects is demonstrated, and the first algorithm for synthesizing examples that are adversarial over a chosen distribution of transformations is presented, which synthesizes two-dimensional adversarial images that are robust to noise, distortion, and affine transformation.
Explaining and Harnessing Adversarial Examples
TLDR
It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.
Adversarial Machine Learning at Scale
TLDR
This research applies adversarial training to ImageNet and finds that single-step attacks are the best for mounting black-box attacks, and resolution of a "label leaking" effect that causes adversarially trained models to perform better on adversarial examples than on clean examples.
Adversarial examples in the physical world
TLDR
It is found that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera, which shows that even in physical world scenarios, machine learning systems are vulnerable to adversarialExamples.
The Limitations of Deep Learning in Adversarial Settings
TLDR
This work formalizes the space of adversaries against deep neural networks (DNNs) and introduces a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs.
Ensemble Adversarial Training: Attacks and Defenses
TLDR
This work finds that adversarial training remains vulnerable to black-box attacks, where perturbations computed on undefended models are transferred to a powerful novel single-step attack that escapes the non-smooth vicinity of the input data via a small random step.
Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks
TLDR
Two feature squeezing methods are explored: reducing the color bit depth of each pixel and spatial smoothing, which are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.
Crafting adversarial input sequences for recurrent neural networks
TLDR
This paper investigates adversarial input sequences for recurrent neural networks processing sequential data and shows that the classes of algorithms introduced previously to craft adversarial samples misclassified by feed-forward neural networks can be adapted to recurrent Neural networks.
Delving into Transferable Adversarial Examples and Black-box Attacks
TLDR
This work is the first to conduct an extensive study of the transferability over large models and a large scale dataset, and it is also theFirst to study the transferabilities of targeted adversarial examples with their target labels.
...
...