Generalizable Data-Free Objective for Crafting Universal Adversarial Perturbations

@article{Mopuri2019GeneralizableDO,
  title={Generalizable Data-Free Objective for Crafting Universal Adversarial Perturbations},
  author={Konda Reddy Mopuri and Aditya Ganeshan and R. Venkatesh Babu},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2019},
  volume={41},
  pages={2452-2465}
}
Machine learning models are susceptible to adversarial perturbations: small changes to input that can cause large changes in output. It is also demonstrated that there exist input-agnostic perturbations, called universal adversarial perturbations, which can change the inference of target model on most of the data samples. However, existing methods to craft universal perturbations are (i) task specific, (ii) require samples from the training data distribution, and (iii) perform complex… 

NAG: Network for Adversary Generation

TLDR
Perturbations crafted by the proposed generative approach to model the distribution of adversarial perturbations achieve state-of-the-art fooling rates, exhibit wide variety and deliver excellent cross model generalizability.

CRAFTING DATA-FREE UNIVERSAL ADVERSARIES

TLDR
A method to create Universal Adversarial Perturbations (UAP) for a given CNN in a data-free manner and shows that the adversary generation with full training data can be approximated to a formulation without data.

Generalizing Universal Adversarial Attacks Beyond Additive Perturbations

TLDR
This paper proposes a novel unified yet flexible framework for universal adversarial attacks, called GUAP, which is able to initiate attacks by additive perturbation, non-additive perturbations, or the combination of both, and demonstrates that GUAP can obtain up to 99.24% successful attack rate on ImageNet dataset, leading to over 19% improvements than current state-of-the-art universal adversary attacks.

G-UAP: Generic Universal Adversarial Perturbation that Fools RPN-based Detectors

TLDR
This paper presents a novel and effective approach called G-UAP to craft universal adversarial perturbations, which can explicitly degrade the detection accuracy of a detector on a wide range of image samples.

Universal Adversarial Perturbations: A Survey

TLDR
This paper attempts to provide a detailed discussion on the various data-driven and data-independent methods for generating universal perturbations, along with measures to defend against such perturbation in various deep learning tasks.

Transferable Universal Adversarial Perturbations Using Generative Models

TLDR
This paper proposes a novel deep learning technique for generating more transferable universal adversarial perturbations (UAPs) and proposes a loss formulation that focuses on the adversarial energy only in the respective first layer of the source models, which supports the transferability of these UAPs to any other target model.

Ask, Acquire, and Attack: Data-free UAP Generation using Class Impressions

TLDR
A neural network based generative model is presented that utilizes the acquired class impressions to learn crafting Universal Adversarial Perturbations (UAP) and achieves state-of-the-art success rates for data-free scenario and closer to that forData-driven setting without actually utilizing any data samples.

Data-free Universal Adversarial Perturbation and Black-box Attack

TLDR
This work proposes to exploit artificial Jigsaw images as the training samples, demonstrating competitive performance and investigates the possibility of exploiting the UAP for a data-free black-box attack which is arguably the most practical yet challenging threat model.

Data-Free Adversarial Perturbations for Practical Black-Box Attack

TLDR
This paper presents a data-free method for crafting adversarial perturbations that can fool a target model without any knowledge about the training data distribution and demonstrates that current deep learning models are still at a risk even when the attackers do not have access to training data.
...

References

SHOWING 1-10 OF 51 REFERENCES

Fast Feature Fool: A data independent approach to universal adversarial perturbations

TLDR
This paper proposes a novel data independent approach to generate image agnostic perturbations for a range of CNNs trained for object recognition and shows that these perturbation are transferable across multiple network architectures trained either on same or different data.

NAG: Network for Adversary Generation

TLDR
Perturbations crafted by the proposed generative approach to model the distribution of adversarial perturbations achieve state-of-the-art fooling rates, exhibit wide variety and deliver excellent cross model generalizability.

Ask, Acquire, and Attack: Data-free UAP Generation using Class Impressions

TLDR
A neural network based generative model is presented that utilizes the acquired class impressions to learn crafting Universal Adversarial Perturbations (UAP) and achieves state-of-the-art success rates for data-free scenario and closer to that forData-driven setting without actually utilizing any data samples.

Universal Adversarial Perturbations Against Semantic Image Segmentation

TLDR
This work presents an approach for generating (universal) adversarial perturbations that make the network yield a desired target segmentation as output and shows empirically that there exist barely perceptible universal noise patterns which result in nearly the same predicted segmentation for arbitrary inputs.

Adversarial Diversity and Hard Positive Generation

TLDR
A new psychometric perceptual adversarial similarity score (PASS) measure for quantifying adversarial images, the notion of hard positive generation is introduced, and a novel hot/cold approach for adversarial example generation is presented, which provides multiple possible adversarial perturbations for every single image.

Universal Adversarial Perturbations

TLDR
The surprising existence of universal perturbations reveals important geometric correlations among the high-dimensional decision boundary of classifiers and outlines potential security breaches with the existence of single directions in the input space that adversaries can possibly exploit to break a classifier on most natural images.

Adversarial Machine Learning at Scale

TLDR
This research applies adversarial training to ImageNet and finds that single-step attacks are the best for mounting black-box attacks, and resolution of a "label leaking" effect that causes adversarially trained models to perform better on adversarial examples than on clean examples.

Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples

TLDR
This work introduces the first practical demonstration that cross-model transfer phenomenon enables attackers to control a remotely hosted DNN with no access to the model, its parameters, or its training data, and introduces the attack strategy of fitting a substitute model to the input-output pairs in this manner, then crafting adversarial examples based on this auxiliary model.

A study of the effect of JPG compression on adversarial images

TLDR
It is found that JPG compression often reverses the drop in classification accuracy to a large extent, but not always, and as the magnitude of the perturbations increases, JPG recompression alone is insufficient to reverse the effect.

Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks

TLDR
Two feature squeezing methods are explored: reducing the color bit depth of each pixel and spatial smoothing, which are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.
...