• Corpus ID: 238857179

Adversarial examples by perturbing high-level features in intermediate decoder layers

  title={Adversarial examples by perturbing high-level features in intermediate decoder layers},
  author={Vojtěch {\vC}erm{\'a}k and Luk{\'a}s Adam},
We propose a novel method for creating adversarial examples. Instead of perturbing pixels, we use an encoder-decoder representation of the input image and perturb intermediate layers in the decoder. This changes the high-level features provided by the generative model. Therefore, our perturbation possesses semantic meaning, such as a longer beak or green tints. We formulate this task as an optimization problem by minimizing the Wasserstein distance between the adversarial and initial images… 

Figures and Tables from this paper


Constructing Unrestricted Adversarial Examples with Generative Models
The empirical results on the MNIST, SVHN, and CelebA datasets show that unrestricted adversarial examples can bypass strong adversarial training and certified defense methods designed for traditional adversarial attacks.
Wasserstein Adversarial Examples via Projected Sinkhorn Iterations
A new threat model for adversarial attacks based on the Wasserstein distance is proposed, which can successfully attack image classification models, and it is demonstrated that PGD-based adversarial training can improve this adversarial accuracy to 76%.
Generating Adversarial Examples with Adversarial Networks
Adversarial examples generated by AdvGAN on different target models have high attack success rate under state-of-the-art defenses compared to other attacks, and have placed the first with 92.76% accuracy on a public MNIST black-box attack challenge.
The Limitations of Deep Learning in Adversarial Settings
This work formalizes the space of adversaries against deep neural networks (DNNs) and introduces a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs.
Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models
The proposed Defense-GAN, a new framework leveraging the expressive capability of generative models to defend deep neural networks against adversarial perturbations, is empirically shown to be consistently effective against different attack methods and improves on existing defense strategies.
Large Scale Adversarial Representation Learning
This work builds upon the state-of-the-art BigGAN model, extending it to representation learning by adding an encoder and modifying the discriminator, and demonstrates that these generation-based models achieve the state of the art in unsupervised representation learning on ImageNet, as well as in unconditional image generation.
MagNet: A Two-Pronged Defense against Adversarial Examples
MagNet, a framework for defending neural network classifiers against adversarial examples, is proposed and it is shown empirically that MagNet is effective against the most advanced state-of-the-art attacks in blackbox and graybox scenarios without sacrificing false positive rate on normal examples.
Towards Deep Learning Models Resistant to Adversarial Attacks
This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee.
Explaining and Harnessing Adversarial Examples
It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.
Adversarial Machine Learning at Scale
This research applies adversarial training to ImageNet and finds that single-step attacks are the best for mounting black-box attacks, and resolution of a "label leaking" effect that causes adversarially trained models to perform better on adversarial examples than on clean examples.