• Corpus ID: 195791557

Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack

@inproceedings{Croce2020MinimallyDA,
  title={Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack},
  author={Francesco Croce and Matthias Hein},
  booktitle={ICML},
  year={2020}
}
The evaluation of robustness against adversarial manipulation of neural networks-based classifiers is mainly tested with empirical attacks as methods for the exact computation, even when available, do not scale to large networks. We propose in this paper a new white-box adversarial attack wrt the $l_p$-norms for $p \in \{1,2,\infty\}$ aiming at finding the minimal perturbation necessary to change the class of a given input. It has an intuitive geometric meaning, yields quickly high quality… 
Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks
TLDR
Two extensions of the PGD-attack overcoming failures due to suboptimal step size and problems of the objective function are proposed and combined with two complementary existing ones to form a parameter-free, computationally affordable and user-independent ensemble of attacks to test adversarial robustness.
Output Diversified Initialization for Adversarial Attacks
TLDR
Output Diversified Initialization (ODI), a novel random initialization strategy that can be combined with most existing white-box adversarial attacks, is proposed, which outperforms current state-of-the-art attacks against robust models and becomes much more efficient on several datasets.
Adaptive Regularization for Adversarial Training
TLDR
A new adversarial training algorithm is proposed that is theoretically well motivated and empirically superior to other existing algorithms and a novel feature of the proposed algorithm is to use a data-adaptive regularization for robustifying a prediction model.
Combating Adversaries with Anti-Adversaries
TLDR
The anti-adversary layer is proposed, which generates an input perturbation in the opposite direction of the adversarial one and feeds the classifier a perturbed version of the input.
Towards Transferable Adversarial Perturbations with Minimum Norm
TLDR
This work proposes a geometryaware framework to generate transferable adversarial perturbation with minimum norm for each input, analogous to model selection in statistical machine learning, and leverages a validation model to select the optimal perturbations budget for each image.
Fast Minimum-norm Adversarial Attacks through Adaptive Norm Constraints
TLDR
A fast minimum-norm (FMN) attack that works with different p -norm perturbation models, is robust to hyperparameter choices, does not require adversarial starting points, and converges within few lightweight steps is proposed.
Towards Understanding Fast Adversarial Training
TLDR
This paper conducts experiments to understand the behavior of fast adversarial training and shows the key to its success is the ability to recover from overfitting to weak attacks, and extends the findings to improve fast adversaria training, demonstrating superior robust accuracy to strong adversarialTraining, with much-reduced training time.
Constrained Gradient Descent: A Powerful and Principled Evasion Attack Against Neural Networks
TLDR
This paper introduces several innovations that make white-box targeted attacks follow the intuition of the attacker’s goal: to trick the model to assign a higher probability to the target class than to any other, while staying within a specified distance from the original input.
Pixle: a fast and effective black-box attack based on rearranging pixels
TLDR
This paper proposes a novel attack that can be performed without knowing the inner structure of the attacked model, nor the training procedure, and is capable of correctly attacking a high percentage of samples by rearranging a small number of pixels within the attacked image.
Localized Uncertainty Attacks
The susceptibility of deep learning models to adversarial perturbations has stirred renewed attention in adversarial examples resulting in a number of attacks. However, most of these attacks fail to
...
...

References

SHOWING 1-10 OF 38 REFERENCES
Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks
TLDR
Two extensions of the PGD-attack overcoming failures due to suboptimal step size and problems of the objective function are proposed and combined with two complementary existing ones to form a parameter-free, computationally affordable and user-independent ensemble of attacks to test adversarial robustness.
Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation
TLDR
This paper shows for the first time formal guarantees on the robustness of a classifier by giving instance-specific lower bounds on the norm of the input manipulation required to change the classifier decision.
Scaling up the Randomized Gradient-Free Adversarial Attack Reveals Overestimation of Robustness Using Established Attacks
TLDR
This work significantly improves the randomized gradient-free attack for ReLU networks (Croce and Hein in GCPR, 2018), in particular by scaling it up to large networks, thus revealing an overestimation of the robustness by state-of-the-art methods.
Wasserstein Adversarial Examples via Projected Sinkhorn Iterations
TLDR
A new threat model for adversarial attacks based on the Wasserstein distance is proposed, which can successfully attack image classification models, and it is demonstrated that PGD-based adversarial training can improve this adversarial accuracy to 76%.
Adversarial Training and Robustness for Multiple Perturbations
TLDR
It is proved that a trade-off in robustness to different types of $\ell_p$-bounded and spatial perturbations must exist in a natural and simple statistical setting, and questioned the viability and computational scalability of extending adversarial robustness, and adversarial training, to multiple perturbation types.
Distributionally Adversarial Attack
TLDR
Distributionally adversarial attack (DAA), a framework to solve an optimal adversarial-data distribution, a perturbed distribution that satisfies the L∞ constraint but deviates from the original data distribution to increase the generalization risk maximally is proposed.
Towards Evaluating the Robustness of Neural Networks
TLDR
It is demonstrated that defensive distillation does not significantly increase the robustness of neural networks, and three new attack algorithms are introduced that are successful on both distilled and undistilled neural networks with 100% probability are introduced.
Towards Deep Learning Models Resistant to Adversarial Attacks
TLDR
This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee.
Simple Black-Box Adversarial Perturbations for Deep Networks
TLDR
This work focuses on deep convolutional neural networks and demonstrates that adversaries can easily craft adversarial examples even without any internal knowledge of the target network.
One Pixel Attack for Fooling Deep Neural Networks
TLDR
This paper proposes a novel method for generating one-pixel adversarial perturbations based on differential evolution (DE), which requires less adversarial information (a black-box attack) and can fool more types of networks due to the inherent features of DE.
...
...