FADER: Fast Adversarial Example Rejection

@article{Crecchi2021FADERFA,
  title={FADER: Fast Adversarial Example Rejection},
  author={Francesco Crecchi and Marco Melis and Angelo Sotgiu and Davide Bacciu and Battista Biggio},
  journal={ArXiv},
  year={2021},
  volume={abs/2010.09119}
}

Figures and Tables from this paper

Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart
TLDR
The authors' rectified rejection (RR) module is evaluated under several attacks including adaptive ones, and it is demonstrated that the RR module is compatible with different adversarial training frameworks on improving robustness, with little extra computation.
Adversarial Training with Rectified Rejection
TLDR
It is proved that under mild conditions, a rectified confidence (R-Con) rejector and a confidence rejector can be coupled to distinguish any wrongly classified input from correctly classified ones, even under adaptive attacks.
Layer-wise Regularized Adversarial Training using Layers Sustainability Analysis (LSA) framework
TLDR
A novel framework (Layer Sustainability Analysis (LSA) for the analysis of layer vulnerability in an arbitrary neural network in the scenario of adversarial attacks and performs well theoretically and experimentally for state-of-the-art multilayer perceptron and convolutional neural network architectures.
Deep-RBF Networks for Anomaly Detection in Automotive Cyber-Physical Systems
TLDR
This paper designs deep-RBF networks using popular DNNs such as NVIDIA DAVE-II, and ResNet20, and uses the resulting rejection class for detecting adversarial attacks such as a physical attack and data poison attack, and shows that the deep- RBF networks can robustly detect these attacks in a short time without additional resource requirements.

References

SHOWING 1-10 OF 65 REFERENCES
Deep neural rejection against adversarial examples
TLDR
This work proposes a deep neural rejection mechanism to detect adversarial examples, based on the idea of rejecting samples that exhibit anomalous feature representations at different network layers, and empirically shows that this approach outperforms previously proposed methods that detect adversarian examples by only analyzing the feature representation provided by the output network layer.
Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models
TLDR
The proposed Defense-GAN, a new framework leveraging the expressive capability of generative models to defend deep neural networks against adversarial perturbations, is empirically shown to be consistently effective against different attack methods and improves on existing defense strategies.
Detecting Adversarial Samples from Artifacts
TLDR
This paper investigates model confidence on adversarial samples by looking at Bayesian uncertainty estimates, available in dropout neural networks, and by performing density estimation in the subspace of deep features learned by the model, and results show a method for implicit adversarial detection that is oblivious to the attack algorithm.
On Detecting Adversarial Perturbations
TLDR
It is shown empirically that adversarial perturbations can be detected surprisingly well even though they are quasi-imperceptible to humans.
MagNet: A Two-Pronged Defense against Adversarial Examples
TLDR
MagNet, a framework for defending neural network classifiers against adversarial examples, is proposed and it is shown empirically that MagNet is effective against the most advanced state-of-the-art attacks in blackbox and graybox scenarios without sacrificing false positive rate on normal examples.
Towards Deep Learning Models Resistant to Adversarial Attacks
TLDR
This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee.
Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks
TLDR
The study shows that defensive distillation can reduce effectiveness of sample creation from 95% to less than 0.5% on a studied DNN, and analytically investigates the generalizability and robustness properties granted by the use of defensive Distillation when training DNNs.
Towards Evaluating the Robustness of Neural Networks
TLDR
It is demonstrated that defensive distillation does not significantly increase the robustness of neural networks, and three new attack algorithms are introduced that are successful on both distilled and undistilled neural networks with 100% probability are introduced.
Fortified Networks: Improving the Robustness of Deep Networks by Modeling the Manifold of Hidden Representations
TLDR
Fortified Networks is proposed, a simple transformation of existing networks, which fortifies the hidden layers in a deep network by identifying when the hidden states are off of the data manifold, and maps these hidden states back to parts of theData manifold where the network performs well.
Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning
TLDR
The DkNN algorithm is evaluated on several datasets, and it is shown the confidence estimates accurately identify inputs outside the model, and that the explanations provided by nearest neighbors are intuitive and useful in understanding model failures.
...
...