• Corpus ID: 9059612

Adversarial Machine Learning at Scale

@article{Kurakin2017AdversarialML,
  title={Adversarial Machine Learning at Scale},
  author={Alexey Kurakin and Ian J. Goodfellow and Samy Bengio},
  journal={ArXiv},
  year={2017},
  volume={abs/1611.01236}
}
Adversarial examples are malicious inputs designed to fool machine learning models. They often transfer from one model to another, allowing attackers to mount black box attacks without knowledge of the target model's parameters. Adversarial training is the process of explicitly training a model on adversarial examples, in order to make it more robust to attack or to reduce its test error on clean inputs. So far, adversarial training has primarily been applied to small problems. In this research… 

Figures and Tables from this paper

Ensemble Adversarial Training: Attacks and Defenses
TLDR
This work finds that adversarial training remains vulnerable to black-box attacks, where perturbations computed on undefended models are transferred to a powerful novel single-step attack that escapes the non-smooth vicinity of the input data via a small random step.
On Improving the Effectiveness of Adversarial Training
  • Yi Qin, Ryan Hunt, Chuan Yue
  • Computer Science
    Proceedings of the ACM International Workshop on Security and Privacy Analytics - IWSPA '19
  • 2019
TLDR
An adversarial training experimental framework is designed to answer two research questions and finds that MBEAT is indeed beneficial, indicating that it has some important value in practice, and that RGOAT indeed exists, indicated that adversarialTraining should be an iterative process.
Adversarial Attacks on Neural Network Policies
TLDR
This work shows existing adversarial example crafting techniques can be used to significantly degrade test-time performance of trained policies, even with small adversarial perturbations that do not interfere with human perception.
Efficient Two-Step Adversarial Defense for Deep Neural Networks
TLDR
This paper empirically demonstrates the effectiveness of the proposed two-step defense approach against different attack methods and its improvements over existing defense strategies, allowing defense against adversarial attacks with a robustness level comparable to that of the adversarial training with multi-step adversarial examples.
Towards Model-Agnostic Adversarial Defenses using Adversarially Trained Autoencoders
TLDR
This work proposes Adversarially-Trained Autoencoder Augmentation (AAA), the first model-agnostic adversarial defense that is robust against certain adaptive adversaries and shows that it can be used to create a fully model-gnostic defense for MNIST and Fashion MNIST datasets.
Using Single-Step Adversarial Training to Defend Iterative Adversarial Examples
TLDR
This work proposes a novel single- step adversarial training method that can defend against both single-step and iterative adversarial examples, and demonstrates the scalability of the approach and its performance advantages over SOTA single-Step approaches.
ON NEURAL NETWORK POLICIES
TLDR
This work shows existing adversarial example crafting techniques can be used to significantly degrade test-time performance of trained policies, even with small adversarial perturbations that do not interfere with human perception.
Improving the Generalization of Adversarial Training with Domain Adaptation
TLDR
Empirical evaluations demonstrate that ATDA can greatly improve the generalization of adversarial training and the smoothness of the learned models, and outperforms state-of-the-art methods on standard benchmark datasets.
Efficient Adversarial Training With Transferable Adversarial Examples
TLDR
This paper shows that there is high transferability between models from neighboring epochs in the same training process, i.e., adversarial examples from one epoch continue to be adversarial in subsequent epochs, and proposes a novel method, Adversarial Training with Transferable Adversaria Examples (ATTA), that can enhance the robustness of trained models and greatly improve the training efficiency by accumulating adversarial perturbations through epochs.
Direction-Aggregated Attack for Transferable Adversarial Examples
TLDR
This paper proposes the Direction-Aggregated adversarial attacks that deliver transferable adversarial examples that improves the transferability of adversarialExamples significantly and outperforms state-of-the-art attacks, especially against adversarial trained models.
...
...

References

SHOWING 1-10 OF 26 REFERENCES
Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
TLDR
New transferability attacks between previously unexplored (substitute, victim) pairs of machine learning model classes, most notably SVMs and decision trees are introduced.
Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples
TLDR
This work introduces the first practical demonstration that cross-model transfer phenomenon enables attackers to control a remotely hosted DNN with no access to the model, its parameters, or its training data, and introduces the attack strategy of fitting a substitute model to the input-output pairs in this manner, then crafting adversarial examples based on this auxiliary model.
Adversarial examples in the physical world
TLDR
It is found that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera, which shows that even in physical world scenarios, machine learning systems are vulnerable to adversarialExamples.
Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks
TLDR
The study shows that defensive distillation can reduce effectiveness of sample creation from 95% to less than 0.5% on a studied DNN, and analytically investigates the generalizability and robustness properties granted by the use of defensive Distillation when training DNNs.
Are Accuracy and Robustness Correlated
TLDR
It is demonstrated that better machine learning models are less vulnerable to adversarial examples, and cross-model adversarial portability is found to be mostly transferable across similar network topologies.
Explaining and Harnessing Adversarial Examples
TLDR
It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.
Evasion Attacks against Machine Learning at Test Time
TLDR
This work presents a simple but effective gradient-based approach that can be exploited to systematically assess the security of several, widely-used classification algorithms against evasion attacks.
Virtual Adversarial Training for Semi-Supervised Text Classification
TLDR
This work extends adversarial and virtual adversarial training to the text domain by applying perturbations to the word embeddings in a recurrent neural network rather than to the original input itself.
Distributional Smoothing with Virtual Adversarial Training
TLDR
When the LDS based regularization was applied to supervised and semi-supervised learning for the MNIST dataset, it outperformed all the training methods other than the current state of the art method, which is based on a highly advanced generative model.
Learning with a Strong Adversary
TLDR
A new and simple way of finding adversarial examples is presented and experimentally shown to be efficient and greatly improves the robustness of the classification models produced.
...
...