Minority Reports Defense: Defending Against Adversarial Patches

@inproceedings{McCoyd2020MinorityRD,
  title={Minority Reports Defense: Defending Against Adversarial Patches},
  author={Michael McCoyd and Won Park and Steven Chen and Neil Shah and Ryan Roggenkemper and Minjune Hwang and Jason Xinyu Liu and David A. Wagner},
  booktitle={ACNS Workshops},
  year={2020}
}
Deep learning image classification is vulnerable to adversarial attack, even if the attacker changes just a small patch of the image. We propose a defense against patch attacks based on partially occluding the image around each candidate patch location, so that a few occlusions each completely hide the patch. We demonstrate on CIFAR-10, Fashion MNIST, and MNIST that our defense provides certified security against patch attacks of a certain size. 

Defending against Adversarial Patches with Robust Self-Attention

TLDR
A new defense against adversarial patch attacks based on the proposed Robust SelfAttention (RSA) layer, which replaces the outlier-sensitive weighted mean operation used by standard Self-Attention with a robust aggregation mechanism that detects and masks outlier tokens.

PatchGuard: A Provably Robust Defense against Adversarial Patches via Small Receptive Fields and Masking

TLDR
This paper proposes a general defense framework called PatchGuard that can achieve high provable robustness while maintaining high clean accuracy against localized adversarial patches, and presents the robust masking defense that robustly detects and masks corrupted features to recover the correct prediction.

PatchGuard: Provable Defense against Adversarial Patches Using Masks on Small Receptive Fields

TLDR
The robust masking defense that robustly detects and masks corrupted features to recover the correct prediction is presented, which achieves state-of-the-art provable robust accuracy on ImageNette, ImageNet, and CIFAR-10 datasets.

PatchCleanser: Certifiably Robust Defense against Adversarial Patches for Any Image Classifier

TLDR
It is proved that PatchCleanser will always predict the correct class labels on certain images against any adaptive white-box attacker within the authors' threat model, achieving certified robustness.

Certified defense against patch attacks via mask-guided randomized smoothing

The adversarial patch is a practical and effective method that modifies a small region on an image, making DNNs fail to classify. Existing empirical defenses against adversarial patch attacks lack

Segment and Complete: Defending Object Detectors against Adversarial Patch Attacks with Robust Patch Detection

TLDR
This paper proposes Segment and Complete defense (SAC), a general framework for defending object detectors against patch attacks through detection and removal of adversarial patches, and presents the APRICOT-Mask dataset, which augments the APRicOT dataset with pixel-level annotations of adversaria patches.

Zero-Shot Certified Defense against Adversarial Patches with Vision Transformers

TLDR
PatchVeto is proposed, a zero-shot certified defense against adversarial patches based on Vision Transformer (ViT) models that can achieve high accuracy on clean inputs while detecting adversarial patched inputs by simply manipulating the attention map of ViT.

Adversarial Patch Attacks and Defences in Vision-Based Tasks: A Survey

TLDR
An overview of existing techniques of adversarial patch attacks is provided to help interested researchers quickly catch up with the progress, and existing techniques for developing detection and defences against adversarial patches are discussed to help the community better understand this type of attack and its applications in the real world.

PatchGuard++: Efficient Provable Attack Detection against Adversarial Patches

TLDR
This paper extends PatchGuard toPatchGuard++ for provably detecting the adversarial patch attack to boost both provable robust accuracy and clean accuracy and demonstrates that PatchGuard++ significantly improves the provably robustness and clean performance.

REALIZABLE PATCH ATTACKS VIA RANDOMIZED CROPPING

  • Computer Science
  • 2020
This paper proposes a certifiable defense against adversarial patch attacks on image classification. Our approach classifies random crops from the original image independently and classifies the

References

SHOWING 1-10 OF 23 REFERENCES

On Visible Adversarial Perturbations & Digital Watermarking

  • Jamie Hayes
  • Computer Science
    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
  • 2018
TLDR
Under this threat model, adversarial perturbations transform images such that the model's output is classified as an attacker chosen class and attacks that can bypass these defenses are discussed.

Defending Against Physically Realizable Attacks on Image Classification

TLDR
A new abstract adversarial model is proposed, rectangular occlusion attacks, in which an adversary places a small adversarially crafted rectangle in an image, and two approaches for efficiently computing the resulting adversarial examples are developed.

LaVAN: Localized and Visible Adversarial Noise

TLDR
It is shown that it is possible to generate localized adversarial noises that cover only 2% of the pixels in the image, none of them over the main object, and that are transferable across images and locations, and successfully fool a state-of-the-art Inception v3 model with very high success rates.

Local Gradients Smoothing: Defense Against Localized Adversarial Attacks

TLDR
This work has developed an effective method to estimate noise location in gradient domain and transform those high activation regions caused by adversarial noise in image domain while having minimal effect on the salient object that is important for correct classification.

Fooling Automated Surveillance Cameras: Adversarial Patches to Attack Person Detection

TLDR
The goal is to generate a patch that is able to successfully hide a person from a person detector, and this work is the first to attempt this kind of attack on targets with a high level of intra-class variety like persons.

Adversarial T-Shirt! Evading Person Detectors in a Physical World

TLDR
This is the first work that models the effect of deformation for designing physical adversarial examples with respect to-rigid objects such as T-shirts and shows that the proposed method achieves74% and 57% attack success rates in the digital and physical worlds respectively against YOLOv2 and Faster R-CNN.

Robust Physical-World Attacks on Machine Learning Models

TLDR
This paper proposes a new attack algorithm--Robust Physical Perturbations (RP2)-- that generates perturbations by taking images under different conditions into account and can create spatially-constrained perturbation that mimic vandalism or art to reduce the likelihood of detection by a casual observer.

Adversarial machine learning

TLDR
A taxonomy for classifying attacks against online machine learning algorithms and the limits of an adversary's knowledge about the algorithm, feature space, training, and input data are given.

Explaining and Harnessing Adversarial Examples

TLDR
It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.

Towards Evaluating the Robustness of Neural Networks

TLDR
It is demonstrated that defensive distillation does not significantly increase the robustness of neural networks, and three new attack algorithms are introduced that are successful on both distilled and undistilled neural networks with 100% probability are introduced.