SafetyNet: Detecting and Rejecting Adversarial Examples Robustly

@article{Lu2017SafetyNetDA,
  title={SafetyNet: Detecting and Rejecting Adversarial Examples Robustly},
  author={Jiajun Lu and Theerasit Issaranon and David Alexander Forsyth},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
  year={2017},
  pages={446-454}
}
We describe a method to produce a network where current methods such as DeepFool have great difficulty producing adversarial samples. Our construction suggests some insights into how deep networks work. We provide a reasonable analyses that our construction is difficult to defeat, and show experimentally that our method is hard to defeat with both Type I and Type II attacks using several standard networks and datasets. This SafetyNet architecture is used to an important and novel application… 
Detection of Adversarial Examples in Deep Neural Networks with Natural Scene Statistics
TLDR
It is demonstrated that these statistical properties are altered by the presence of adversarial perturbations, and three different methods are proposed that exploit these scene statistics to determine if an input is adversarial or not.
Natural Scene Statistics for Detecting Adversarial Examples in Deep Neural Networks
TLDR
This paper proposes to characterize the adversarial perturbations through the use of natural scene statistics, and designs a classifier that exploits these scene statistics to determine if an input is adversarial or not.
Trace and Detect Adversarial Attacks on CNNs Using Feature Response Maps
TLDR
This work proposes a novel detection method for adversarial examples to prevent attacks on convolutional neural networks by tracking adversarial perturbations in feature responses, allowing for automatic detection using average local spatial entropy.
Connecting the Dots: Detecting Adversarial Perturbations Using Context Inconsistency
TLDR
This work augments the DNN with a system that learns context consistency rules during training and checks for the violations of the same during testing, and builds a set of auto-encoders appropriately trained so as to output a discrepancy between the input and output if an added adversarial perturbation violates context consistency Rules.
ReabsNet: Detecting and Revising Adversarial Examples
TLDR
The proposed ReabsNet is to augment an existing classification network with a guardian network to detect if a sample is natural or has been adversarially perturbed, and outperforms the state-of-the-art defense method under various adversarial attacks.
DLA: Dense-Layer-Analysis for Adversarial Example Detection
TLDR
It is shown that dense layers of DNNs carry security-sensitive information and a novel end-to-end framework to detect such attacks without influencing the target model's performance is presented.
Generating Adversarial yet Inconspicuous Patches with a Single Image
TLDR
This work proposes an approach to gen-erate adversarial yet inconspicuous patches with onesingle image that preserves the attack ability in the physical world and shows strong attacking ability in both the white-box and black-box setting.
PAT: Pseudo-Adversarial Training For Detecting Adversarial Videos
TLDR
This paper proposes a novel yet simple algorithm, Pseudo-Adversarial Training (PAT), to detect the adversarial frames in a video without requiring knowledge of the attack, and generates ‘transition frames’ that capture critical deviation from the original frames and eliminate the components insignificant to the detection task.
All You Need is RAW: Defending Against Adversarial Attacks with Camera Image Pipelines
TLDR
This work proposed a model-agnostic adversarial defensive method, which maps the input RGB images to Bayer RAW space and back to output RGB using a learned camera image signal processing (ISP) pipeline to eliminate potential adversarial patterns.
Deep neural rejection against adversarial examples
TLDR
This work proposes a deep neural rejection mechanism to detect adversarial examples, based on the idea of rejecting samples that exhibit anomalous feature representations at different network layers, and empirically shows that this approach outperforms previously proposed methods that detect adversarian examples by only analyzing the feature representation provided by the output network layer.
...
...

References

SHOWING 1-10 OF 32 REFERENCES
NO Need to Worry about Adversarial Examples in Object Detection in Autonomous Vehicles
It has been shown that most machine learning algorithms are susceptible to adversarial perturbations. Slightly perturbing an image in a carefully chosen direction in the image space may cause a
On Detecting Adversarial Perturbations
TLDR
It is shown empirically that adversarial perturbations can be detected surprisingly well even though they are quasi-imperceptible to humans.
Universal Adversarial Perturbations
TLDR
The surprising existence of universal perturbations reveals important geometric correlations among the high-dimensional decision boundary of classifiers and outlines potential security breaches with the existence of single directions in the input space that adversaries can possibly exploit to break a classifier on most natural images.
Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks
TLDR
The study shows that defensive distillation can reduce effectiveness of sample creation from 95% to less than 0.5% on a studied DNN, and analytically investigates the generalizability and robustness properties granted by the use of defensive Distillation when training DNNs.
Delving into Transferable Adversarial Examples and Black-box Attacks
TLDR
This work is the first to conduct an extensive study of the transferability over large models and a large scale dataset, and it is also theFirst to study the transferabilities of targeted adversarial examples with their target labels.
DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks
TLDR
The DeepFool algorithm is proposed to efficiently compute perturbations that fool deep networks, and thus reliably quantify the robustness of these classifiers, and outperforms recent methods in the task of computing adversarial perturbation and making classifiers more robust.
Explaining and Harnessing Adversarial Examples
TLDR
It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.
Towards Evaluating the Robustness of Neural Networks
TLDR
It is demonstrated that defensive distillation does not significantly increase the robustness of neural networks, and three new attack algorithms are introduced that are successful on both distilled and undistilled neural networks with 100% probability are introduced.
Adversarial examples in the physical world
TLDR
It is found that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera, which shows that even in physical world scenarios, machine learning systems are vulnerable to adversarialExamples.
Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples
TLDR
This work introduces the first practical demonstration that cross-model transfer phenomenon enables attackers to control a remotely hosted DNN with no access to the model, its parameters, or its training data, and introduces the attack strategy of fitting a substitute model to the input-output pairs in this manner, then crafting adversarial examples based on this auxiliary model.
...
...