# Towards Evaluating the Robustness of Neural Networks

@article{Carlini2017TowardsET,
title={Towards Evaluating the Robustness of Neural Networks},
author={Nicholas Carlini and David A. Wagner},
journal={2017 IEEE Symposium on Security and Privacy (SP)},
year={2017},
pages={39-57}
}
• Published 16 August 2016
• Computer Science
• 2017 IEEE Symposium on Security and Privacy (SP)
Neural networks provide state-of-the-art results for most machine learning tasks. [...] Key Method Our attacks are tailored to three distance metrics used previously in the literature, and when compared to previous adversarial example generation algorithms, our attacks are often much more effective (and never worse). Furthermore, we propose using high-confidence adversarial examples in a simple transferability test we show can also be used to break defensive distillation. We hope our attacks will be used as a…Expand
4,012 Citations
Detecting Adversarial Examples Using Data Manifolds
• Computer Science
• MILCOM 2018 - 2018 IEEE Military Communications Conference (MILCOM)
• 2018
The goal of finding limitations of the learning model presents a more tractable approach to protecting against adversarial attacks, based on identifying a low dimensional manifold in which the training samples lie and using the distance of a new observation from this manifold to identify whether this data point is adversarial or not. Expand
Learning More Robust Features with Adversarial Training
• Computer Science, Mathematics
• ArXiv
• 2018
This paper proposes a method, which can be seen as an extension of adversarial training, to train neural networks to learn more robust features, and shows that this method greatly improves the robustness of the learned features and the resistance to adversarial attacks. Expand
Towards Deep Learning Models Resistant to Adversarial Attacks
• Computer Science, Mathematics
• ICLR
• 2018
This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee. Expand
Evaluation and Design of Robust Neural Network Defenses
A general framework for evaluating the robustness of neural network through optimization-based methods is introduced, and a new classifier is constructed which is provably robust by design under a restricted threat model. Expand
Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach
This paper provides a theoretical justification for converting robustness analysis into a local Lipschitz constant estimation problem, and proposes to use the Extreme Value Theory for efficient evaluation, which yields a novel robustness metric called CLEVER, which is short for Cross LPschitz Extreme Value for nEtwork Robustness. Expand
Feedback Learning for Improving the Robustness of Neural Networks
• Computer Science, Mathematics
• 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)
• 2019
A feedback learning method is proposed, to understand how well a model learns and to facilitate the retraining process of remedying the defects, and can significantly improve models' accuracy and robustness against different types of evasion attacks. Expand
Convergence of Adversarial Training in Overparametrized Neural Networks
• Computer Science, Mathematics
• NeurIPS
• 2019
This paper provides a partial answer to the success of adversarial training, by showing that it converges to a network where the surrogate loss with respect to the the attack algorithm is within $\epsilon$ of the optimal robust loss. Expand
Pruning in the Face of Adversaries
• Computer Science
• ArXiv
• 2021
This work evaluates the robustness of pruned models against L0, L2 and L∞ attacks for a wide range of attack strengths, several architectures, data sets, pruning methods, and compression rates, and confirms that neural network pruning and adversarial robustness are not mutually exclusive. Expand
Efficient Two-Step Adversarial Defense for Deep Neural Networks
• Mathematics, Computer Science
• ArXiv
• 2018
This paper empirically demonstrates the effectiveness of the proposed two-step defense approach against different attack methods and its improvements over existing defense strategies, allowing defense against adversarial attacks with a robustness level comparable to that of the adversarial training with multi-step adversarial examples. Expand
• Mathematics, Computer Science
• ArXiv
• 2018
The fundamental mechanisms behind adversarial examples are investigated and a novel robust training method via regulating adversarial gradients is proposed, which effectively squeezes the adversarial Gradient gradients of neural networks and significantly increases the difficulty of adversarial example generation. Expand

#### References

SHOWING 1-10 OF 64 REFERENCES
Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks
• Computer Science, Mathematics
• 2016 IEEE Symposium on Security and Privacy (SP)
• 2016
The study shows that defensive distillation can reduce effectiveness of sample creation from 95% to less than 0.5% on a studied DNN, and analytically investigates the generalizability and robustness properties granted by the use of defensive Distillation when training DNNs. Expand
Towards Deep Neural Network Architectures Robust to Adversarial Examples
• Computer Science, Mathematics
• ICLR
• 2015
Deep Contractive Network is proposed, a model with a new end-to-end training procedure that includes a smoothness penalty inspired by the contractive autoencoder (CAE) to increase the network robustness to adversarial examples, without a significant performance penalty. Expand
The Limitations of Deep Learning in Adversarial Settings
• Computer Science, Mathematics
• 2016 IEEE European Symposium on Security and Privacy (EuroS&P)
• 2016
This work formalizes the space of adversaries against deep neural networks (DNNs) and introduces a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs. Expand
Adversarial Perturbations Against Deep Neural Networks for Malware Classification
• Computer Science
• ArXiv
• 2016
This paper shows how to construct highly-effective adversarial sample crafting attacks for neural networks used as malware classifiers, and evaluates to which extent potential defensive mechanisms against adversarial crafting can be leveraged to the setting of malware classification. Expand
Thermometer Encoding: One Hot Way To Resist Adversarial Examples
• Computer Science
• ICLR
• 2018
A simple modification to standard neural network ar3 chitectures, thermometer encoding is proposed, which significantly increases the robustness of the network to adversarial examples, and the proper ties of these networks are explored, providing evidence that thermometer encodings help neural networks to find more-non-linear decision boundaries. Expand
Measuring Neural Net Robustness with Constraints
• Computer Science, Mathematics
• NIPS
• 2016
This work proposes metrics for measuring the robustness of a neural net and devise a novel algorithm for approximating these metrics based on an encoding of robustness as a linear program and generates more informative estimates of robusts metrics compared to estimates based on existing algorithms. Expand
• Computer Science, Mathematics
• ICLR
• 2015
It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets. Expand
Adversarial examples in the physical world
• Computer Science, Mathematics
• ICLR
• 2017
It is found that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera, which shows that even in physical world scenarios, machine learning systems are vulnerable to adversarialExamples. Expand
Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
• Computer Science
• ArXiv
• 2016
New transferability attacks between previously unexplored (substitute, victim) pairs of machine learning model classes, most notably SVMs and decision trees are introduced. Expand
Safety Verification of Deep Neural Networks
• Computer Science, Mathematics
• CAV
• 2017
A novel automated verification framework for feed-forward multi-layer neural networks based on Satisfiability Modulo Theory (SMT) is developed, which defines safety for an individual decision in terms of invariance of the classification within a small neighbourhood of the original image. Expand