• Corpus ID: 16716974

Adversarial Transformation Networks: Learning to Generate Adversarial Examples

  title={Adversarial Transformation Networks: Learning to Generate Adversarial Examples},
  author={Shumeet Baluja and Ian S. Fischer},
Multiple different approaches of generating adversarial examples have been proposed to attack deep neural networks. [] Key Method We call such a network an Adversarial Transformation Network (ATN). ATNs are trained to generate adversarial examples that minimally modify the classifier's outputs given the original input, while constraining the new classification to match an adversarial target class. We present methods to train ATNs and analyze their effectiveness targeting a variety of MNIST classifiers as…

Generating Adversarial Examples with Adversarial Networks

Adversarial examples generated by AdvGAN on different target models have high attack success rate under state-of-the-art defenses compared to other attacks, and have placed the first with 92.76% accuracy on a public MNIST black-box attack challenge.

A Direct Approach to Robust Deep Learning Using Adversarial Networks

This paper model the adversarial noise using a generative network, trained jointly with a classification discriminative network as a minimax game, and shows empirically that this adversarial network approach works well against black box attacks, with performance on par with state-of-art methods such as ensemble adversarial training and adversarialTraining with projected gradient descent.


  • Computer Science
  • 2020
AT-GAN (Adversarial Transfer on Generative Adversarial Net) is proposed to train an adversarial generative model that can directly produce adversarial examples and can efficiently generate diverse adversarialExamples that are realistic to human perception, and yields higher attack success rates against adversarially trained models.

Generating Adversarial Examples with Graph Neural Networks

It is shown that this method beats state-of-the-art adversarial attacks, including PGD-attack, MI-FGSM, and Carlini and Wagner attack, reducing the time required to generate adversarial examples with small perturbation norms by over 65% and achieves good generalization performance on unseen networks.

Generalizable Adversarial Attacks Using Generative Models

This work frames the problem as learning a distribution of adversarial perturbations, enabling it to generate diverse adversarial distributions given an unperturbed input, and shows that this framework is domain-agnostic in that the same framework can be employed to attack different input domains with minimal modification.

Adversarial Defense via Learning to Generate Diverse Attacks

This work proposes a recursive and stochastic generator that produces much stronger and diverse perturbations that comprehensively reveal the vulnerability of the target classifier.

Constructing Unrestricted Adversarial Examples with Generative Models

The empirical results on the MNIST, SVHN, and CelebA datasets show that unrestricted adversarial examples can bypass strong adversarial training and certified defense methods designed for traditional adversarial attacks.

NAG: Network for Adversary Generation

Perturbations crafted by the proposed generative approach to model the distribution of adversarial perturbations achieve state-of-the-art fooling rates, exhibit wide variety and deliver excellent cross model generalizability.

Detecting Adversarial Examples Through Image Transformation

An effective method to detect adversarial examples in image classification by introducing randomness in the process of image transformation, which can achieve a detection ratio of around 70%.

Machine Learning as an Adversarial Service: Learning Black-Box Adversarial Examples

A direct attack against black-box neural networks, that uses another attacker neural network to learn to craft adversarial examples that can transfer to different machine learning models such as Random Forest, SVM, and K-Nearest Neighbor is introduced.



The Limitations of Deep Learning in Adversarial Settings

This work formalizes the space of adversaries against deep neural networks (DNNs) and introduces a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs.

Adversarial Machine Learning at Scale

This research applies adversarial training to ImageNet and finds that single-step attacks are the best for mounting black-box attacks, and resolution of a "label leaking" effect that causes adversarially trained models to perform better on adversarial examples than on clean examples.

Explaining and Harnessing Adversarial Examples

It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.

On Detecting Adversarial Perturbations

It is shown empirically that adversarial perturbations can be detected surprisingly well even though they are quasi-imperceptible to humans.

Towards Evaluating the Robustness of Neural Networks

It is demonstrated that defensive distillation does not significantly increase the robustness of neural networks, and three new attack algorithms are introduced that are successful on both distilled and undistilled neural networks with 100% probability are introduced.

Delving into Transferable Adversarial Examples and Black-box Attacks

This work is the first to conduct an extensive study of the transferability over large models and a large scale dataset, and it is also theFirst to study the transferabilities of targeted adversarial examples with their target labels.

Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples

This work introduces the first practical demonstration that cross-model transfer phenomenon enables attackers to control a remotely hosted DNN with no access to the model, its parameters, or its training data, and introduces the attack strategy of fitting a substitute model to the input-output pairs in this manner, then crafting adversarial examples based on this auxiliary model.

Adversarial examples in the physical world

It is found that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera, which shows that even in physical world scenarios, machine learning systems are vulnerable to adversarialExamples.

Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

New transferability attacks between previously unexplored (substitute, victim) pairs of machine learning model classes, most notably SVMs and decision trees are introduced.

Generative Adversarial Nets

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a