Adversarial Machine Learning at Scale
@article{Kurakin2017AdversarialML, title={Adversarial Machine Learning at Scale}, author={Alexey Kurakin and Ian J. Goodfellow and Samy Bengio}, journal={ArXiv}, year={2017}, volume={abs/1611.01236} }
Adversarial examples are malicious inputs designed to fool machine learning models. They often transfer from one model to another, allowing attackers to mount black box attacks without knowledge of the target model's parameters. Adversarial training is the process of explicitly training a model on adversarial examples, in order to make it more robust to attack or to reduce its test error on clean inputs. So far, adversarial training has primarily been applied to small problems. In this research…
Figures and Tables from this paper
1,951 Citations
Ensemble Adversarial Training: Attacks and Defenses
- Computer ScienceICLR
- 2018
This work finds that adversarial training remains vulnerable to black-box attacks, where perturbations computed on undefended models are transferred to a powerful novel single-step attack that escapes the non-smooth vicinity of the input data via a small random step.
On Improving the Effectiveness of Adversarial Training
- Computer ScienceProceedings of the ACM International Workshop on Security and Privacy Analytics - IWSPA '19
- 2019
An adversarial training experimental framework is designed to answer two research questions and finds that MBEAT is indeed beneficial, indicating that it has some important value in practice, and that RGOAT indeed exists, indicated that adversarialTraining should be an iterative process.
Adversarial Attacks on Neural Network Policies
- Computer ScienceICLR
- 2017
This work shows existing adversarial example crafting techniques can be used to significantly degrade test-time performance of trained policies, even with small adversarial perturbations that do not interfere with human perception.
Efficient Two-Step Adversarial Defense for Deep Neural Networks
- Computer ScienceArXiv
- 2018
This paper empirically demonstrates the effectiveness of the proposed two-step defense approach against different attack methods and its improvements over existing defense strategies, allowing defense against adversarial attacks with a robustness level comparable to that of the adversarial training with multi-step adversarial examples.
Towards Model-Agnostic Adversarial Defenses using Adversarially Trained Autoencoders
- Computer Science
- 2019
This work proposes Adversarially-Trained Autoencoder Augmentation (AAA), the first model-agnostic adversarial defense that is robust against certain adaptive adversaries and shows that it can be used to create a fully model-gnostic defense for MNIST and Fashion MNIST datasets.
Using Single-Step Adversarial Training to Defend Iterative Adversarial Examples
- Computer ScienceCODASPY
- 2021
This work proposes a novel single- step adversarial training method that can defend against both single-step and iterative adversarial examples, and demonstrates the scalability of the approach and its performance advantages over SOTA single-Step approaches.
ON NEURAL NETWORK POLICIES
- Computer Science
- 2017
This work shows existing adversarial example crafting techniques can be used to significantly degrade test-time performance of trained policies, even with small adversarial perturbations that do not interfere with human perception.
Improving the Generalization of Adversarial Training with Domain Adaptation
- Computer ScienceICLR
- 2019
Empirical evaluations demonstrate that ATDA can greatly improve the generalization of adversarial training and the smoothness of the learned models, and outperforms state-of-the-art methods on standard benchmark datasets.
Efficient Adversarial Training With Transferable Adversarial Examples
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
This paper shows that there is high transferability between models from neighboring epochs in the same training process, i.e., adversarial examples from one epoch continue to be adversarial in subsequent epochs, and proposes a novel method, Adversarial Training with Transferable Adversaria Examples (ATTA), that can enhance the robustness of trained models and greatly improve the training efficiency by accumulating adversarial perturbations through epochs.
Direction-Aggregated Attack for Transferable Adversarial Examples
- Computer ScienceACM Journal on Emerging Technologies in Computing Systems
- 2022
This paper proposes the Direction-Aggregated adversarial attacks that deliver transferable adversarial examples that improves the transferability of adversarialExamples significantly and outperforms state-of-the-art attacks, especially against adversarial trained models.
References
SHOWING 1-10 OF 26 REFERENCES
Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
- Computer ScienceArXiv
- 2016
New transferability attacks between previously unexplored (substitute, victim) pairs of machine learning model classes, most notably SVMs and decision trees are introduced.
Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples
- Computer ScienceArXiv
- 2016
This work introduces the first practical demonstration that cross-model transfer phenomenon enables attackers to control a remotely hosted DNN with no access to the model, its parameters, or its training data, and introduces the attack strategy of fitting a substitute model to the input-output pairs in this manner, then crafting adversarial examples based on this auxiliary model.
Adversarial examples in the physical world
- Computer ScienceICLR
- 2017
It is found that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera, which shows that even in physical world scenarios, machine learning systems are vulnerable to adversarialExamples.
Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks
- Computer Science2016 IEEE Symposium on Security and Privacy (SP)
- 2016
The study shows that defensive distillation can reduce effectiveness of sample creation from 95% to less than 0.5% on a studied DNN, and analytically investigates the generalizability and robustness properties granted by the use of defensive Distillation when training DNNs.
Are Accuracy and Robustness Correlated
- Computer Science2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)
- 2016
It is demonstrated that better machine learning models are less vulnerable to adversarial examples, and cross-model adversarial portability is found to be mostly transferable across similar network topologies.
Explaining and Harnessing Adversarial Examples
- Computer ScienceICLR
- 2015
It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.
Evasion Attacks against Machine Learning at Test Time
- Computer ScienceECML/PKDD
- 2013
This work presents a simple but effective gradient-based approach that can be exploited to systematically assess the security of several, widely-used classification algorithms against evasion attacks.
Virtual Adversarial Training for Semi-Supervised Text Classification
- Computer ScienceArXiv
- 2016
This work extends adversarial and virtual adversarial training to the text domain by applying perturbations to the word embeddings in a recurrent neural network rather than to the original input itself.
Distributional Smoothing with Virtual Adversarial Training
- Computer ScienceICLR 2016
- 2015
When the LDS based regularization was applied to supervised and semi-supervised learning for the MNIST dataset, it outperformed all the training methods other than the current state of the art method, which is based on a highly advanced generative model.
Learning with a Strong Adversary
- Computer ScienceArXiv
- 2015
A new and simple way of finding adversarial examples is presented and experimentally shown to be efficient and greatly improves the robustness of the classification models produced.