Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods

@article{Carlini2017AdversarialEA,
  title={Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods},
  author={Nicholas Carlini and David A. Wagner},
  journal={Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security},
  year={2017}
}
  • Nicholas Carlini, D. Wagner
  • Published 20 May 2017
  • Computer Science
  • Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security
Neural networks are known to be vulnerable to adversarial examples: inputs that are close to natural inputs but classified incorrectly. In order to better understand the space of adversarial examples, we survey ten recent proposals that are designed for detection and compare their efficacy. We show that all can be defeated by constructing new loss functions. We conclude that adversarial examples are significantly harder to detect than previously appreciated, and the properties believed to be… 

Figures from this paper

Deep neural rejection against adversarial examples
TLDR
This work proposes a deep neural rejection mechanism to detect adversarial examples, based on the idea of rejecting samples that exhibit anomalous feature representations at different network layers, and empirically shows that this approach outperforms previously proposed methods that detect adversarian examples by only analyzing the feature representation provided by the output network layer.
Adversarial Examples Detection Beyond Image Space
TLDR
This work proposes a method beyond image space by a two-stream architecture, in which the image stream focuses on the pixel artifacts and the gradient stream copes with the confidence artifacts, which outperforms the existing methods under oblivious attacks and is verified effective to defend omniscient attacks.
Detecting Adversarial Examples through Nonlinear Dimensionality Reduction.
TLDR
This work proposes a detection method based on combining non-linear dimensionality reduction and density estimation techniques and empirical findings show that the proposed approach is able to effectively detect adversarial examples crafted by non-adaptive attackers.
Detecting Black-box Adversarial Examples through Nonlinear Dimensionality Reduction
TLDR
This work proposes a detection method based on combining non-linear dimensionality reduction and density estimation techniques and empirical findings show that the proposed approach is able to effectively detect adversarial examples crafted by non-adaptive attackers.
Towards Robust Detection of Adversarial Examples
TLDR
This paper presents a novel training procedure and a thresholding test strategy, towards robust detection of adversarial examples, and proposes to minimize the reverse cross-entropy (RCE), which encourages a deep network to learn latent representations that better distinguish adversarialExamples from normal ones.
Where Classification Fails, Interpretation Rises
TLDR
This work builds upon recent advances in interpretable models and constructs a new detection framework that contrasts an input's interpretation against its classification, and believes that it opens a new direction for designing adversarial input detection methods.
Learning to Characterize Adversarial Subspaces
TLDR
This work proposes a novel adversarial detection method which identifies adversaries by adaptively learning reasonable metrics to characterize adversarial subspaces and proposes an innovative model called Neighbor Context Encoder (NCE) to learn from k neighbors context and infer if the detected sample is normal or adversarial.
Adversarial Examples Are Not Bugs, They Are Features
TLDR
It is demonstrated that adversarial examples can be directly attributed to the presence of non-robust features: features derived from patterns in the data distribution that are highly predictive, yet brittle and incomprehensible to humans.
Divide-and-Conquer Adversarial Detection
TLDR
This paper trains adversary-robust auxiliary detectors to discriminate in-class natural examples from adversarially crafted out-of-class examples, and demonstrates that with the novel training scheme their models learn significant more robust representation than ordinary adversarial training.
Adversarial Examples on Object Recognition
TLDR
The hypotheses behind their existence, the methods used to construct or protect against them, and the capacity to transfer adversarial examples between different machine learning models are introduced.
...
...

References

SHOWING 1-10 OF 45 REFERENCES
Detecting Adversarial Samples from Artifacts
TLDR
This paper investigates model confidence on adversarial samples by looking at Bayesian uncertainty estimates, available in dropout neural networks, and by performing density estimation in the subspace of deep features learned by the model, and results show a method for implicit adversarial detection that is oblivious to the attack algorithm.
Early Methods for Detecting Adversarial Images
TLDR
The best detection method reveals that adversarial images place abnormal emphasis on the lower-ranked principal components from PCA, and adversaries trying to bypass detectors must make the adversarial image less pathological or they will fail trying.
Adversarial and Clean Data Are Not Twins
TLDR
This paper shows that a simple binary classifier can be built separating the adversarial apart from the clean data with accuracy over 99% and empirically shows that the binary classifiers is robust to a second-round adversarial attack.
On Detecting Adversarial Perturbations
TLDR
It is shown empirically that adversarial perturbations can be detected surprisingly well even though they are quasi-imperceptible to humans.
On the (Statistical) Detection of Adversarial Examples
TLDR
It is shown that statistical properties of adversarial examples are essential to their detection, and they are not drawn from the same distribution than the original data, and can thus be detected using statistical tests.
Explaining and Harnessing Adversarial Examples
TLDR
It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.
Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics
  • Xin Li, Fuxin Li
  • Computer Science
    2017 IEEE International Conference on Computer Vision (ICCV)
  • 2017
TLDR
After detecting adversarial examples, it is shown that many of them can be recovered by simply performing a small average filter on the image, which should lead to more insights about the classification mechanisms in deep convolutional neural networks.
Towards Evaluating the Robustness of Neural Networks
TLDR
It is demonstrated that defensive distillation does not significantly increase the robustness of neural networks, and three new attack algorithms are introduced that are successful on both distilled and undistilled neural networks with 100% probability are introduced.
Towards Deep Neural Network Architectures Robust to Adversarial Examples
TLDR
Deep Contractive Network is proposed, a model with a new end-to-end training procedure that includes a smoothness penalty inspired by the contractive autoencoder (CAE) to increase the network robustness to adversarial examples, without a significant performance penalty.
Learning with a Strong Adversary
TLDR
A new and simple way of finding adversarial examples is presented and experimentally shown to be efficient and greatly improves the robustness of the classification models produced.
...
...