• Corpus ID: 3488815

Towards Deep Learning Models Resistant to Adversarial Attacks

@article{Madry2018TowardsDL,
  title={Towards Deep Learning Models Resistant to Adversarial Attacks},
  author={Aleksander Madry and Aleksandar Makelov and Ludwig Schmidt and Dimitris Tsipras and Adrian Vladu},
  journal={ArXiv},
  year={2018},
  volume={abs/1706.06083}
}
Recent work has demonstrated that neural networks are vulnerable to adversarial examples, i.e., inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. [] Key Method Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary.
Divide-and-Conquer Adversarial Detection
TLDR
This paper trains adversary-robust auxiliary detectors to discriminate in-class natural examples from adversarially crafted out-of-class examples, and demonstrates that with the novel training scheme their models learn significant more robust representation than ordinary adversarial training.
Hardening Deep Neural Networks via Adversarial Model Cascades
TLDR
The proposed Adversarial Model Cascades (AMC) trains a cascade of models sequentially where each model is optimized to be robust towards a mixture of multiple attacks, which yields a single model which is secure against a wide range of attacks.
Towards Natural Robustness Against Adversarial Examples
TLDR
This paper theoretically proves that there is an upper bound for neural networks with identity mappings to constrain the error caused by adversarial noises, and demonstrates that a new family of deep neural networks called Neural ODEs (Chen et al., 2018) holds a weaker upper bound.
On the Connection between Differential Privacy and Adversarial Robustness in Machine Learning
TLDR
This work proposes PixelDP, a strategy for learning robust deep neural networks based on formal DP guarantees, and observes that the semantic of DP is closely aligned with the formal definition of robustness to adversarial examples.
Defending Against Adversarial Attacks Using Random Forests
TLDR
This paper proposes to use a simple yet very effective non-differentiable hybrid model that combines DNNs and random forests, rather than hide gradients from attackers, to defend against the attacks.
Learning to Disentangle Robust and Vulnerable Features for Adversarial Detection
TLDR
This work hypothesizes that the adversarial inputs are tied to latent features that are susceptible to adversarial perturbation, which is called vulnerable features, and proposes a minimax game formulation to disentangle the latent features of each instance into robust and vulnerable ones, using variational autoencoders with two latent spaces.
Defending Against Adversarial Attacks Using Random Forest
TLDR
This paper proposes to use a simple yet very effective non-differentiable hybrid model that combines DNNs and random forests, rather than hide gradients from attackers, to defend against the attacks.
Towards Understanding and Improving the Transferability of Adversarial Examples in Deep Neural Networks
TLDR
This work empirically investigates two classes of factors that might influence the transferability of adversarial examples, including model-specific factors, including network architecture, model capacity and test accuracy, and proposes a simple but effective strategy to improve the transferable.
Adversarial Attacks and Defenses
TLDR
It is observed that many existing methods of adversarial attacks and defenses, although not explicitly claimed, can be understood from the perspective of interpretation, and the challenges and future directions for tackling adversary issues with interpretation are discussed.
Defending Against Adversarial Samples Without Security through Obscurity
TLDR
This work proposes a generic approach that integrates a data transformation module with a DNN, making it robust even if the underlying learning algorithm is revealed, and evaluates the generality of this proposed approach and its potential for handling cyber security applications.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 42 REFERENCES
Towards Evaluating the Robustness of Neural Networks
TLDR
It is demonstrated that defensive distillation does not significantly increase the robustness of neural networks, and three new attack algorithms are introduced that are successful on both distilled and undistilled neural networks with 100% probability are introduced.
Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks
TLDR
The study shows that defensive distillation can reduce effectiveness of sample creation from 95% to less than 0.5% on a studied DNN, and analytically investigates the generalizability and robustness properties granted by the use of defensive Distillation when training DNNs.
Ground-Truth Adversarial Examples
TLDR
Ground truths are constructed: adversarial examples with a provably-minimal distance from a given input point that can serve to assess the effectiveness of attack techniques and also of defense techniques, by computing the distance to the ground truths before and after the defense is applied, and measuring the improvement.
The Limitations of Deep Learning in Adversarial Settings
TLDR
This work formalizes the space of adversaries against deep neural networks (DNNs) and introduces a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs.
Towards Deep Neural Network Architectures Robust to Adversarial Examples
TLDR
Deep Contractive Network is proposed, a model with a new end-to-end training procedure that includes a smoothness penalty inspired by the contractive autoencoder (CAE) to increase the network robustness to adversarial examples, without a significant performance penalty.
Towards Robust Deep Neural Networks with BANG
TLDR
A novel theory is presented to explain why this unpleasant phenomenon exists in deep neural networks and a simple, efficient, and effective training approach is introduced, Batch Adjusted Network Gradients (BANG), which significantly improves the robustness of machine learning models.
Ensemble Adversarial Training: Attacks and Defenses
TLDR
This work finds that adversarial training remains vulnerable to black-box attacks, where perturbations computed on undefended models are transferred to a powerful novel single-step attack that escapes the non-smooth vicinity of the input data via a small random step.
Adversarial Machine Learning at Scale
TLDR
This research applies adversarial training to ImageNet and finds that single-step attacks are the best for mounting black-box attacks, and resolution of a "label leaking" effect that causes adversarially trained models to perform better on adversarial examples than on clean examples.
Towards the first adversarially robust neural network model on MNIST
TLDR
A novel robust classification model that performs analysis by synthesis using learned class-conditional data distributions is presented and it is demonstrated that most adversarial examples are strongly perturbed towards the perceptual boundary between the original and the adversarial class.
The Space of Transferable Adversarial Examples
TLDR
It is found that adversarial examples span a contiguous subspace of large (~25) dimensionality, which indicates that it may be possible to design defenses against transfer-based attacks, even for models that are vulnerable to direct attacks.
...
1
2
3
4
5
...