Corpus ID: 195820512

Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions

@article{Qin2020DetectingAD,
  title={Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions},
  author={Yao Qin and Nicholas Frosst and Sara Sabour and Colin Raffel and G. Cottrell and Geoffrey E. Hinton},
  journal={ArXiv},
  year={2020},
  volume={abs/1907.02957}
}
Adversarial examples raise questions about whether neural network models are sensitive to the same visual features as humans. In this paper, we first detect adversarial examples or otherwise corrupted images based on a class-conditional reconstruction of the input. To specifically attack our detection mechanism, we propose the Reconstructive Attack which seeks both to cause a misclassification and a low reconstruction error. This reconstructive attack produces undetected adversarial examples… Expand
Detecting Adversarial Patches with Class Conditional Reconstruction Networks
TLDR
This investigation uses one adversarial detection method based on autoencoder architectures, and performs adversarial patching experiments on MNIST, SVHN, and CIFAR10 against a CNN architecture and two CapsNet architectures, showing that the detector retains some of its effectiveness even against adaptive adversarialPatch attacks. Expand
Self-Supervised Adversarial Example Detection by Disentangled Representation
TLDR
By disentangling input images as class features and semantic features, an autoencoder is trained over both correctly paired class/semantic features and incorrectly pairedclass/semantics features to reconstruct benign and counterexamples, which mimics the behavior of adversarial examples and can reduce the unnecessary generalization ability of autoen coder. Expand
Stabilized Medical Image Attacks
TLDR
This paper proposes an image-based medical adversarial attack method to consistently produce adversarial perturbations on medical images and analyzes the KL-divergence of the proposed loss function to find that the loss stabilization term makes the perturbings updated towards a fixed objective spot while deviating from the ground truth. Expand
Learning to Separate Clusters of Adversarial Representations for Robust Adversarial Detection
TLDR
This paper considers the non-robust features as a common property of adversarial examples, and deduce it is possible to find a cluster in representation space corresponding to the property, and leads to a new probabilistic adversarial detector motivated by a recently introduced non-Robust feature. Expand
Effective and Efficient Vote Attack on Capsule Networks
TLDR
This work proposes a novel vote attack where the inner workings of CapsNets change when the output capsules are attacked, and integrates the vote attack into the detection-aware attack paradigm, which can successfully bypass the class-conditional reconstruction based detection method. Expand
Advances in adversarial attacks and defenses in computer vision: A survey
TLDR
A literature review of the contributions made by the computer vision community in adversarial attacks on deep learning until the advent of year 2018, which focuses on the advances in this area since 2018. Expand
Threat of Adversarial Attacks on Deep Learning in Computer Vision: Survey II
TLDR
A literature review of the contributions made by the computer vision community in adversarial attacks on deep learning until the advent of year 2018, which focuses on the advances in this area since 2018. Expand
Improving White-box Robustness of Pre-processing Defenses via Joint Adversarial Training
TLDR
A method called Joint Adversarial Training based Pre-processing (JATP) defense is proposed that could effectively mitigate the robustness degradation effect across different target models in comparison to previous state-of-the-art approaches. Expand
DAAIN: Detection of Anomalous and Adversarial Input using Normalizing Flows
TLDR
This work introduces a novel technique, DAAIN, to detect OOD inputs and AA for image segmentation in a unified setting using an ESPNet trained on the Cityscapes dataset as segmentation model, an affine Normalizing Flow as density estimator and blue noise to ensure homogeneous sampling. Expand
WaveGuard: Understanding and Mitigating Audio Adversarial Examples
TLDR
WaveGuard is introduced: a framework for detecting adversarial inputs that are crafted to attack ASR systems and empirically demonstrates that audio transformations that recover audio from perceptually informed representations can lead to a strong defense that is robust against an adaptive adversary even in a complete whitebox setting. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 45 REFERENCES
Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics
  • Xin Li, Fuxin Li
  • Computer Science
  • 2017 IEEE International Conference on Computer Vision (ICCV)
  • 2017
TLDR
After detecting adversarial examples, it is shown that many of them can be recovered by simply performing a small average filter on the image, which should lead to more insights about the classification mechanisms in deep convolutional neural networks. Expand
PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples
Adversarial perturbations of normal images are usually imperceptible to humans, but they can seriously confuse state-of-the-art machine learning models. What makes them so special in the eyes ofExpand
Detecting Adversarial Samples from Artifacts
TLDR
This paper investigates model confidence on adversarial samples by looking at Bayesian uncertainty estimates, available in dropout neural networks, and by performing density estimation in the subspace of deep features learned by the model, and results show a method for implicit adversarial detection that is oblivious to the attack algorithm. Expand
The Robust Manifold Defense: Adversarial Training using Generative Models
We propose a new type of attack for finding adversarial examples for image classifiers. Our method exploits spanners, i.e. deep neural networks whose input space is low-dimensional and whose outputExpand
On Detecting Adversarial Perturbations
TLDR
It is shown empirically that adversarial perturbations can be detected surprisingly well even though they are quasi-imperceptible to humans. Expand
With Friends Like These, Who Needs Adversaries?
TLDR
This analysis resolves the apparent contradiction between accuracy and vulnerability and provides a new perspective on much of the prior art and reveals profound implications for efforts to construct neural nets that are both accurate and robust to adversarial attack. Expand
MagNet: A Two-Pronged Defense against Adversarial Examples
TLDR
MagNet, a framework for defending neural network classifiers against adversarial examples, is proposed and it is shown empirically that MagNet is effective against the most advanced state-of-the-art attacks in blackbox and graybox scenarios without sacrificing false positive rate on normal examples. Expand
Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models
TLDR
The proposed Defense-GAN, a new framework leveraging the expressive capability of generative models to defend deep neural networks against adversarial perturbations, is empirically shown to be consistently effective against different attack methods and improves on existing defense strategies. Expand
Adversarial Spheres
TLDR
A fundamental tradeoff between the amount of test error and the average distance to nearest error is shown, which proves that any model which misclassifies a small constant fraction of a sphere will be vulnerable to adversarial perturbations of size O(1/\sqrt{d})$. Expand
Early Methods for Detecting Adversarial Images
TLDR
The best detection method reveals that adversarial images place abnormal emphasis on the lower-ranked principal components from PCA, and adversaries trying to bypass detectors must make the adversarial image less pathological or they will fail trying. Expand
...
1
2
3
4
5
...