• Corpus ID: 211818320

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

@article{Croce2020ReliableEO,
  title={Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks},
  author={Francesco Croce and Matthias Hein},
  journal={ArXiv},
  year={2020},
  volume={abs/2003.01690}
}
The field of defense strategies against adversarial attacks has significantly grown over the last years, but progress is hampered as the evaluation of adversarial defenses is often insufficient and thus gives a wrong impression of robustness. Many promising defenses could be broken later on, making it difficult to identify the state-of-the-art. Frequent pitfalls in the evaluation are improper tuning of hyperparameters of the attacks, gradient obfuscation or masking. In this paper we first… 

RobustBench: a standardized adversarial robustness benchmark

This work evaluates robustness of models for their benchmark with AutoAttack, an ensemble of white- and black-box attacks which was recently shown in a large-scale study to improve almost all robustness evaluations compared to the original publications.

Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack

A new white-box adversarial attack for neural networks-based classifiers aiming at finding the minimal perturbation necessary to change the class of a given input, which performs better or similar to state-of-the-art attacks which are partially specialized to one $l_p$-norm, and is robust to the phenomenon of gradient masking.

Label Smoothing and Adversarial Robustness

The robustness produced by label smoothing is incomplete based on the fact that its defense effect is volatile, and it cannot defend attacks transferred from a naturally trained model, enlightens the research community to rethink how to evaluate the model's robustness appropriately.

Lagrangian Objective Function Leads to Improved Unforeseen Attack Generalization in Adversarial Training

This paper proposes a simple modification to the AT that mitigates the perturbation `p norm while maximizing the classification loss in the Lagrangian form and argues that crafting adversarial examples based on this scheme results in enhanced attack generalization in the learned model.

Guided Adversarial Attack for Evaluating and Enhancing Adversarial Defenses

This work proposes Guided Adversarial Margin Attack (GAMA), which utilizes function mapping of the clean image to guide the generation of adversaries, thereby resulting in stronger attacks and a relaxation term to the standard loss that increases attack efficacy and leads to more efficient adversarial training.

Adversarial Robustness under Long-Tailed Distribution

The negative impacts induced by imbalanced data on both recognition performance and adversarial robustness are revealed and a clean yet effective framework, RoBal, is proposed, which consists of two dedicated modules, a scale-invariant classifier and data re-balancing via both margin engineering at training stage and boundary adjustment during inference.

Imbalanced Gradients: A New Cause of Overestimated Adversarial Robustness

12 state-of-the-art defense models are examined, and it is found that models exploiting label smoothing easily cause imbalanced gradients, and on which attacks can decrease their PGD robustness by over 23%.

Improving Ensemble Robustness by Collaboratively Promoting and Demoting Adversarial Robustness

This work proposes a simple, but effective strategy to collaborate among committee models of an ensemble model via the secure and insecure sets defined for each model member on a given sample, which provides the flexibility to reduce the adversarial transferability and promote the diversity of ensemble members, which are two crucial factors for better robustness in this ensemble approach.

Automated Discovery of Adaptive Attacks on Adversarial Defenses

This work presents an extensible framework that defines a search space over a set of reusable building blocks and automatically discovers an effective attack on a given model with an unknown defense by searching over suitable combinations of these blocks.

Evaluating the Robustness of Geometry-Aware Instance-Reweighted Adversarial Training

This report rigorously evaluates the adversarial robustness of a very recent method called “Geometry-aware Instance-reweighted Adversarial Training” and provides insights into the reasons behind the vulnerability of GAIRAT to this adversarial attack.
...

References

SHOWING 1-10 OF 58 REFERENCES

Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack

A new white-box adversarial attack for neural networks-based classifiers aiming at finding the minimal perturbation necessary to change the class of a given input, which performs better or similar to state-of-the-art attacks which are partially specialized to one $l_p$-norm, and is robust to the phenomenon of gradient masking.

Bilateral Adversarial Training: Towards Fast Training of More Robust Models Against Adversarial Attacks

  • Jianyu Wang
  • Computer Science
    2019 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2019
The experiment on the very (computationally) challenging ImageNet dataset further demonstrates the effectiveness of the fast method, which shows that random start and the most confusing target attack effectively prevent the label leaking and gradient masking problem.

Scaling up the Randomized Gradient-Free Adversarial Attack Reveals Overestimation of Robustness Using Established Attacks

This work significantly improves the randomized gradient-free attack for ReLU networks (Croce and Hein in GCPR, 2018), in particular by scaling it up to large networks, thus revealing an overestimation of the robustness by state-of-the-art methods.

Certified Adversarial Robustness with Additive Noise

This work establishes a connection between robustness against adversarial perturbation and additive random noise, and proposes a training strategy that can significantly improve the certified bounds.

Metric Learning for Adversarial Robustness

An empirical analysis of deep representations under the state-of-the-art attack method called PGD finds that the attack causes the internal representation to shift closer to the ``false'' class, and proposes to regularize the representation space under attack with metric learning to produce more robust classifiers.

Adversarial Defense via Learning to Generate Diverse Attacks

This work proposes a recursive and stochastic generator that produces much stronger and diverse perturbations that comprehensively reveal the vulnerability of the target classifier.

Towards Deep Learning Models Resistant to Adversarial Attacks

This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee.

Adversarial Robustness through Local Linearization

A novel regularizer is introduced that encourages the loss to behave linearly in the vicinity of the training data, thereby penalizing gradient obfuscation while encouraging robustness and shows via extensive experiments on CIFAR-10 and ImageNet, that models trained with this regularizer avoid gradient obfuscations and can be trained significantly faster than adversarial training.

Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks

This work proposes to class-wise disentangle the intermediate feature representations of deep networks, and forces the features for each class to lie inside a convex polytope that is maximally separated from the polytopes of other classes.

Certified Robustness to Adversarial Examples with Differential Privacy

This paper presents the first certified defense that both scales to large networks and datasets and applies broadly to arbitrary model types, based on a novel connection between robustness against adversarial examples and differential privacy, a cryptographically-inspired privacy formalism.
...