• Corpus ID: 16863734

On the (Statistical) Detection of Adversarial Examples

  title={On the (Statistical) Detection of Adversarial Examples},
  author={Kathrin Grosse and Praveen Manoharan and Nicolas Papernot and Michael Backes and Patrick Mcdaniel},
Machine Learning (ML) models are applied in a variety of tasks such as network intrusion detection or Malware classification. [] Key Result In this way, we show that statistical properties of adversarial examples are essential to their detection.

Secure machine learning against adversarial samples at test time

This paper proposes a new iterative adversarial retraining approach to robustify the model and to reduce the effectiveness of adversarial inputs on DNN models, and develops a parallel implementation that makes the proposed approach scalable for large datasets and complex models.

Adversarial Example Detection and Classification With Asymmetrical Adversarial Training

This paper presents an adversarial example detection method that provides performance guarantee to norm constrained adversaries, and uses the learned class conditional generative models to define generative detection/classification models that are both robust and more interpretable.

Detecting Adversarial Examples Using Data Manifolds

The goal of finding limitations of the learning model presents a more tractable approach to protecting against adversarial attacks, based on identifying a low dimensional manifold in which the training samples lie and using the distance of a new observation from this manifold to identify whether this data point is adversarial or not.

Selective Adversarial Learning for Mobile Malware

The experiment results show that both of the selective mechanisms for adversarial retraining outperform the random selection technique and significantly improve the classifier performance against adversarial attacks.

Unsupervised Detection of Adversarial Examples with Model Explanations

This work proposes a simple yet effective method to detect adversarial examples, using methods developed to explain the model’s behavior, and is the first in suggesting unsupervised defense method using model explanations.

Learning to Detect Adversarial Examples Based on Class Scores

This work proposes to train a support vector machine (SVM) on the class scores of an already trained classification model to detect adversarial examples, and shows that this approach yields an improved detection rate compared to an existing method, whilst being easy to implement.

Divide-and-Conquer Adversarial Detection

This paper trains adversary-robust auxiliary detectors to discriminate in-class natural examples from adversarially crafted out-of-class examples, and demonstrates that with the novel training scheme their models learn significant more robust representation than ordinary adversarial training.

DetectS ec: Evaluating the robustness of object detection models to adversarial attacks

It is shown that many conclusions about adversarial attacks and defenses in image classification tasks do not transfer to object detection tasks, for example, the targeted attack is stronger than the untargeted attack for two‐stage detectors.

Anomaly Detection of Adversarial Examples using Class-conditional Generative Adversarial Networks

An unsupervised attack detector on DNN classifiers based on class-conditional Generative Adversarial Networks (GANs) is proposed and it is demonstrated that anomalies are harder to detect using features closer to the DNN’s output layer.

EAD: an ensemble approach to detect adversarial examples from the hidden features of deep neural networks

The improvement over the state-of-the-art, and the possibility to easily extend EAD to include any arbitrary set of detectors, pave the way to widespread adoption of ensemble approaches in the broad field of adversarial example detection.



Detecting Adversarial Samples from Artifacts

This paper investigates model confidence on adversarial samples by looking at Bayesian uncertainty estimates, available in dropout neural networks, and by performing density estimation in the subspace of deep features learned by the model, and results show a method for implicit adversarial detection that is oblivious to the attack algorithm.

Adversarial Perturbations Against Deep Neural Networks for Malware Classification

This paper shows how to construct highly-effective adversarial sample crafting attacks for neural networks used as malware classifiers, and evaluates to which extent potential defensive mechanisms against adversarial crafting can be leveraged to the setting of malware classification.

Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics

  • Xin LiFuxin Li
  • Computer Science
    2017 IEEE International Conference on Computer Vision (ICCV)
  • 2017
After detecting adversarial examples, it is shown that many of them can be recovered by simply performing a small average filter on the image, which should lead to more insights about the classification mechanisms in deep convolutional neural networks.

The Limitations of Deep Learning in Adversarial Settings

This work formalizes the space of adversaries against deep neural networks (DNNs) and introduces a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs.

Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples

This work introduces the first practical demonstration that cross-model transfer phenomenon enables attackers to control a remotely hosted DNN with no access to the model, its parameters, or its training data, and introduces the attack strategy of fitting a substitute model to the input-output pairs in this manner, then crafting adversarial examples based on this auxiliary model.

Explaining and Harnessing Adversarial Examples

It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.

Towards Evaluating the Robustness of Neural Networks

It is demonstrated that defensive distillation does not significantly increase the robustness of neural networks, and three new attack algorithms are introduced that are successful on both distilled and undistilled neural networks with 100% probability are introduced.

Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

New transferability attacks between previously unexplored (substitute, victim) pairs of machine learning model classes, most notably SVMs and decision trees are introduced.

Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks

The study shows that defensive distillation can reduce effectiveness of sample creation from 95% to less than 0.5% on a studied DNN, and analytically investigates the generalizability and robustness properties granted by the use of defensive Distillation when training DNNs.

Towards Robust Deep Neural Networks with BANG

A novel theory is presented to explain why this unpleasant phenomenon exists in deep neural networks and a simple, efficient, and effective training approach is introduced, Batch Adjusted Network Gradients (BANG), which significantly improves the robustness of machine learning models.