• Corpus ID: 201125165

# Testing Robustness Against Unforeseen Adversaries

@article{Kang2019TestingRA,
author={Daniel Kang and Yi Sun and Dan Hendrycks and Tom B. Brown and Jacob Steinhardt},
journal={ArXiv},
year={2019},
volume={abs/1908.08016}
}
Most existing adversarial defenses only measure robustness to L_p adversarial attacks. Not only are adversaries unlikely to exclusively create small L_p perturbations, adversaries are unlikely to remain fixed. Adversaries adapt and evolve their attacks; hence adversarial defenses must be robust to a broad range of unforeseen attacks. We address this discrepancy between research and reality by proposing a new evaluation framework called ImageNet-UA. Our framework enables the research community…

## Figures, Tables, and Topics from this paper

Adversarial training is the standard to train models robust against adversarial examples. However, especially for complex datasets, adversarial training incurs a significant loss in accuracy and is
CONFIDENCE-CALIBRATED ADVERSARIAL TRAINING and Detection: MORE ROBUST MODELS GENERALIZ-
Adversarial training is the standard to train models robust against adversarial examples. However, especially for complex datasets, adversarial training incurs a significant loss in accuracy and is
Interpolated Joint Space Adversarial Training for Robust and Generalizable Defenses
The Robust Mixup strategy in which the authors maximize the adversity of the interpolated images and gain robustness and prevent overfitting is proposed, and IJSAT achieves good performance in standard accuracy, robustness, and generalization in CIFAR-10/100, OM-ImageNet, and CIFar-10-C datasets.
$\ell_\infty$-Robustness and Beyond: Unleashing Efficient Adversarial Training
• Computer Science
• 2021
By leveraging the theory of coreset selection, it is shown how selecting a small subset of training data provides a more principled approach towards reducing the time complexity of robust training.
• Computer Science
NeurIPS
• 2020
The proposed Dual Manifold Adversarial Training (DMAT) improves performance on normal images, and achieves comparable robustness to the standard adversarial training against Lp attacks, and models defended by DMAT achieve improved robustness against novel attacks which manipulate images by global color shifts or various types of image filtering.
Towards Defending Multiple Adversarial Perturbations via Gated Batch Normalization
Gated Batch Normalization (GBN) is proposed, a novel building block for deep neural networks that improves robustness against multiple perturbation types and performs well on MNIST, CIFAR-10, and Tiny-ImageNet.
Self-Progressing Robust Training
• Computer Science
AAAI
• 2021
A new framework called SPROUT is proposed, self-progressing robust training, that progressively adjusts training label distribution via the authors' proposed parametrized label smoothing technique, making training free of attack generation and more scalable to large neural networks.
Mutual Adversarial Training: Learning together is better than going alone
• Jiang Liu
• Computer Science
ArXiv
• 2021
This paper proposes mutual adversarial training (MAT), in which multiple models are trained together and share the knowledge of adversarial examples to achieve improved robustness, and demonstrates that collaborative learning is an effective strategy for designing robust models.
Lagrangian Objective Function Leads to Improved Unforeseen Attack Generalization in Adversarial Training
• Computer Science
ArXiv
• 2021
This paper proposes a simple modification to the AT that mitigates the perturbation `p norm while maximizing the classification loss in the Lagrangian form and argues that crafting adversarial examples based on this scheme results in enhanced attack generalization in the learned model.
Towards Transferable Adversarial Perturbations with Minimum Norm
Transfer-based adversarial example is one of the most important classes of black-box attacks. Prior work in this direction often requires a fixed but large perturbation radius to reach a good

## References

SHOWING 1-10 OF 62 REFERENCES
Adversarial Training and Robustness for Multiple Perturbations
• Computer Science, Mathematics
NeurIPS
• 2019
It is proved that a trade-off in robustness to different types of $\ell_p$-bounded and spatial perturbations must exist in a natural and simple statistical setting, and questioned the viability and computational scalability of extending adversarial robustness, and adversarial training, to multiple perturbation types.
Barrage of Random Transforms for Adversarially Robust Defense
• Computer Science
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
• 2019
It is shown that, even after accounting for obfuscated gradients, the Barrage of Random Transforms (BaRT) is a resilient defense against even the most difficult attacks, such as PGD.
Exploiting Excessive Invariance caused by Norm-Bounded Adversarial Robustness
• Computer Science, Mathematics
ArXiv
• 2019
This paper demonstrates that robustness to perturbation-based adversarial examples is not only insufficient for general robustness, but worse, it can also increase vulnerability of the model to invariance-based adversaries, and argues that the term adversarial example is used to capture a series of model limitations.
Towards Deep Learning Models Resistant to Adversarial Attacks
• Computer Science, Mathematics
ICLR
• 2018
This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee.
A novel regularizer is introduced that encourages the loss to behave linearly in the vicinity of the training data, thereby penalizing gradient obfuscation while encouraging robustness and shows via extensive experiments on CIFAR-10 and ImageNet, that models trained with this regularizer avoid gradient obfuscations and can be trained significantly faster than adversarial training.
Quantifying Perceptual Distortion of Adversarial Examples
• Computer Science, Mathematics
ArXiv
• 2019
This work presents and employs a unifying framework fusing different attack styles to demonstrate the value of quantifying the perceptual distortion of adversarial examples, and performs adversarial training using attacks generated by the framework to demonstrate that networks are only robust to classes of adversarian perturbations they have been trained against.
Constructing Unrestricted Adversarial Examples with Generative Models
• Computer Science, Mathematics
NeurIPS
• 2018
The empirical results on the MNIST, SVHN, and CelebA datasets show that unrestricted adversarial examples can bypass strong adversarial training and certified defense methods designed for traditional adversarial attacks.
The experimental results demonstrate that by relaxing the constraint of the competition, the elastic-net attack to deep neural networks (EAD) can generate transferable adversarial examples which, despite their high average $L_\infty$ distortion, have minimal visual distortion.