Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks

@article{Papernot2016DistillationAA,
  title={Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks},
  author={Nicolas Papernot and Patrick Mcdaniel and Xi Wu and Somesh Jha and Ananthram Swami},
  journal={2016 IEEE Symposium on Security and Privacy (SP)},
  year={2016},
  pages={582-597}
}
Deep learning algorithms have been shown to perform extremely well on many classical machine learning problems. However, recent studies have shown that deep learning, like other machine learning techniques, is vulnerable to adversarial samples: inputs crafted to force a deep neural network (DNN) to provide adversary-selected outputs. Such attacks can seriously undermine the security of the system supported by the DNN, sometimes with devastating consequences. For example, autonomous vehicles can… 

DeepCloak: Masking Deep Neural Network Models for Robustness Against Adversarial Samples

TLDR
By identifying and removing unnecessary features in a DNN model, DeepCloak limits the capacity an attacker can use generating adversarial samples and therefore increase the robustness against such inputs.

Secure machine learning against adversarial samples at test time

TLDR
This paper proposes a new iterative adversarial retraining approach to robustify the model and to reduce the effectiveness of adversarial inputs on DNN models, and develops a parallel implementation that makes the proposed approach scalable for large datasets and complex models.

DNNShield: Dynamic Randomized Model Sparsification, A Defense Against Adversarial Machine Learning

TLDR
DNNS HIELD is proposed, a hardware-accelerated defense that adapts the strength of the response to the confidence of the adversarial input, which exceeds the detection rate of the state of the art approaches, with a much lower overhead.

DeepMask: Masking DNN Models for robustness against adversarial samples

TLDR
By identifying and removing unnecessary features in a DNN model, DeepCloak limits the capacity an attacker can use generating adversarial samples and therefore increase the robustness against such inputs.

R2AD: Randomization and Reconstructor-based Adversarial Defense for Deep Neural Networks

TLDR
A two-stage adversarial defense technique (R2 AD) is proposed to thwart the exploitation of the deep neural network by the attacker and includes a random nullification (RNF) layer that nullifies/removes some of the features from the input randomly to reduce the impact of adversarial noise.

Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples

TLDR
This work introduces the first practical demonstration that cross-model transfer phenomenon enables attackers to control a remotely hosted DNN with no access to the model, its parameters, or its training data, and introduces the attack strategy of fitting a substitute model to the input-output pairs in this manner, then crafting adversarial examples based on this auxiliary model.

Defensive dropout for hardening deep neural networks under adversarial attacks

TLDR
This work provides a solution to hardening DNNs under adversarial attacks through defensive dropout, and compares with stochastic activation pruning (SAP), another defense method through introducing randomness into the DNN model, finds that it achieves much larger variances of the gradients, which is the key for the improved defense effects.

Learning Adversary-Resistant Deep Neural Networks

TLDR
A generic approach to escalate a DNN's resistance to adversarial samples is proposed, making it robust even if the underlying learning algorithm is revealed, and it typically provides superior classification performance and resistance in comparison with state-of-art solutions.

Detecting Adversarial Samples for Deep Neural Networks through Mutation Testing

TLDR
A statistical adversary detection algorithm called nMutant is designed (inspired by mutation testing from software engineering community) and shown to effectively detects most of the adversarial samples generated by recently proposed attacking methods.

On the Security of Randomized Defenses Against Adversarial Samples

TLDR
This work thoroughly and empirically analyzes the impact of randomization techniques against all classes of adversarial strategies and devise a lightweight randomization strategy for image classification based on feature squeezing, that consists of pre-processing the classifier input by embedding randomness within each feature, before applying feature squeezing.
...

References

SHOWING 1-10 OF 51 REFERENCES

The Limitations of Deep Learning in Adversarial Settings

TLDR
This work formalizes the space of adversaries against deep neural networks (DNNs) and introduces a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs.

Towards Deep Neural Network Architectures Robust to Adversarial Examples

TLDR
Deep Contractive Network is proposed, a model with a new end-to-end training procedure that includes a smoothness penalty inspired by the contractive autoencoder (CAE) to increase the network robustness to adversarial examples, without a significant performance penalty.

Evasion Attacks against Machine Learning at Test Time

TLDR
This work presents a simple but effective gradient-based approach that can be exploited to systematically assess the security of several, widely-used classification algorithms against evasion attacks.

Support Vector Machines Under Adversarial Label Noise

TLDR
This paper assumes that the adversary has control over some training data, and aims to subvert the SVM learning process, and proposes a strategy to improve the robustness of SVMs to training data manipulation based on a simple kernel matrix correction.

Explaining and Harnessing Adversarial Examples

TLDR
It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.

Adversarial machine learning

TLDR
A taxonomy for classifying attacks against online machine learning algorithms and the limits of an adversary's knowledge about the algorithm, feature space, training, and input data are given.

Analysis of classifiers’ robustness to adversarial perturbations

TLDR
A general upper bound on the robustness of classifiers to adversarial perturbations is established, and the phenomenon of adversarial instability is suggested to be due to the low flexibility ofclassifiers, compared to the difficulty of the classification task (captured mathematically by the distinguishability measure).

Pattern Recognition Systems under Attack: Design Issues and Research Challenges

TLDR
The ultimate goal is to provide some useful guidelines for improving the security of pattern recognition in adversarial settings, and to suggest related open issues to foster research in this area.

Security Evaluation of Pattern Classifiers under Attack

TLDR
A framework for empirical evaluation of classifier security that formalizes and generalizes the main ideas proposed in the literature, and given examples of its use in three real applications show that security evaluation can provide a more complete understanding of the classifier's behavior in adversarial environments, and lead to better design choices.

Poisoning Attacks against Support Vector Machines

TLDR
It is demonstrated that an intelligent adversary can, to some extent, predict the change of the SVM's decision function due to malicious input and use this ability to construct malicious data.
...