The Limitations of Deep Learning in Adversarial Settings

@article{Papernot2016TheLO,
  title={The Limitations of Deep Learning in Adversarial Settings},
  author={Nicolas Papernot and Patrick Mcdaniel and Somesh Jha and Matt Fredrikson and Z. Berkay Celik and Ananthram Swami},
  journal={2016 IEEE European Symposium on Security and Privacy (EuroS\&P)},
  year={2016},
  pages={372-387}
}
Deep learning takes advantage of large datasets and computationally efficient training algorithms to outperform other approaches at various machine learning tasks. [...] Key Method In an application to computer vision, we show that our algorithms can reliably produce samples correctly classified by human subjects but misclassified in specific targets by a DNN with a 97% adversarial success rate while only modifying on average 4.02% of the input features per sample. We then evaluate the vulnerability of…Expand
Fidelity: A Property of Deep Neural Networks to Measure the Trustworthiness of Prediction Results
TLDR
The root cause of adversarial examples is analysed, a new property, namely fidelity, of machine learning models is proposed to describe the gap between what a model learns and the ground truth learned by humans, and a novel approach to quantify it is proposed. Expand
Simple Black-Box Adversarial Perturbations for Deep Networks
TLDR
This work focuses on deep convolutional neural networks and demonstrates that adversaries can easily craft adversarial examples even without any internal knowledge of the target network. Expand
R2AD: Randomization and Reconstructor-based Adversarial Defense for Deep Neural Networks
TLDR
A two-stage adversarial defense technique (R2 AD) is proposed to thwart the exploitation of the deep neural network by the attacker and includes a random nullification (RNF) layer that nullifies/removes some of the features from the input randomly to reduce the impact of adversarial noise. Expand
Generating Adversarial Samples With Constrained Wasserstein Distance
TLDR
W-PGD is proposed to generate adversarial samples close to normal data distribution to bypass those detecting mechanisms and effectively decrease the detection rate of distribution-based detection techniques and generate adversary samples with constrained Wasserstein distance. Expand
Detecting Adversarial Examples Using Data Manifolds
TLDR
The goal of finding limitations of the learning model presents a more tractable approach to protecting against adversarial attacks, based on identifying a low dimensional manifold in which the training samples lie and using the distance of a new observation from this manifold to identify whether this data point is adversarial or not. Expand
Detecting Adversarial Examples for Deep Neural Networks via Layer Directed Discriminative Noise Injection
TLDR
A new discriminative noise injection strategy to adaptively select a few dominant layers and progressively discriminate adversarial from benign inputs is proposed, made possible by evaluating the differences in label change rate from both adversarial and natural images by injecting different amount of noise into the weights of individual layers in the model. Expand
Detecting Adversarial Samples from Artifacts
TLDR
This paper investigates model confidence on adversarial samples by looking at Bayesian uncertainty estimates, available in dropout neural networks, and by performing density estimation in the subspace of deep features learned by the model, and results show a method for implicit adversarial detection that is oblivious to the attack algorithm. Expand
Simple Black-Box Adversarial Attacks on Deep Neural Networks
TLDR
This work focuses on deep convolutional neural networks and demonstrates that adversaries can easily craft adversarial examples even without any internal knowledge of the target network, and proposes schemes that could serve as a litmus test for designing robust networks. Expand
Detecting adversarial examples via prediction difference for deep neural networks
TLDR
It is found that the adversarial examples have lager prediction difference for various DNN models due to their various complicated decision boundaries, which can be used to identify the adversary examples by converging decision boundaries to a prediction difference threshold. Expand
Efficient Two-Step Adversarial Defense for Deep Neural Networks
TLDR
This paper empirically demonstrates the effectiveness of the proposed two-step defense approach against different attack methods and its improvements over existing defense strategies, allowing defense against adversarial attacks with a robustness level comparable to that of the adversarial training with multi-step adversarial examples. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 46 REFERENCES
Towards Deep Neural Network Architectures Robust to Adversarial Examples
TLDR
Deep Contractive Network is proposed, a model with a new end-to-end training procedure that includes a smoothness penalty inspired by the contractive autoencoder (CAE) to increase the network robustness to adversarial examples, without a significant performance penalty. Expand
Explaining and Harnessing Adversarial Examples
TLDR
It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets. Expand
Generative Adversarial Nets
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and aExpand
Support Vector Machines Under Adversarial Label Noise
TLDR
This paper assumes that the adversary has control over some training data, and aims to subvert the SVM learning process, and proposes a strategy to improve the robustness of SVMs to training data manipulation based on a simple kernel matrix correction. Expand
Deep neural networks are easily fooled: High confidence predictions for unrecognizable images
TLDR
This work takes convolutional neural networks trained to perform well on either the ImageNet or MNIST datasets and finds images with evolutionary algorithms or gradient ascent that DNNs label with high confidence as belonging to each dataset class, and produces fooling images, which are then used to raise questions about the generality of DNN computer vision. Expand
How transferable are features in deep neural networks?
TLDR
This paper quantifies the generality versus specificity of neurons in each layer of a deep convolutional neural network and reports a few surprising results, including that initializing a network with transferred features from almost any number of layers can produce a boost to generalization that lingers even after fine-tuning to the target dataset. Expand
Intriguing properties of neural networks
TLDR
It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks. Expand
Evasion Attacks against Machine Learning at Test Time
TLDR
This work presents a simple but effective gradient-based approach that can be exploited to systematically assess the security of several, widely-used classification algorithms against evasion attacks. Expand
Adversarial Machine Learning
  • J. Tygar
  • Computer Science
  • IEEE Internet Comput.
  • 2011
The author briefly introduces the emerging field of adversarial machine learning, in which opponents can cause traditional machine learning algorithms to behave poorly in security applications. HeExpand
Pattern Recognition Systems under Attack: Design Issues and Research Challenges
TLDR
The ultimate goal is to provide some useful guidelines for improving the security of pattern recognition in adversarial settings, and to suggest related open issues to foster research in this area. Expand
...
1
2
3
4
5
...