• Corpus ID: 201125165

Testing Robustness Against Unforeseen Adversaries

@article{Kang2019TestingRA,
  title={Testing Robustness Against Unforeseen Adversaries},
  author={Daniel Kang and Yi Sun and Dan Hendrycks and Tom B. Brown and Jacob Steinhardt},
  journal={ArXiv},
  year={2019},
  volume={abs/1908.08016}
}
Most existing adversarial defenses only measure robustness to L_p adversarial attacks. Not only are adversaries unlikely to exclusively create small L_p perturbations, adversaries are unlikely to remain fixed. Adversaries adapt and evolve their attacks; hence adversarial defenses must be robust to a broad range of unforeseen attacks. We address this discrepancy between research and reality by proposing a new evaluation framework called ImageNet-UA. Our framework enables the research community… 
CONFIDENCE-CALIBRATED ADVERSARIAL TRAINING
Adversarial training is the standard to train models robust against adversarial examples. However, especially for complex datasets, adversarial training incurs a significant loss in accuracy and is
CONFIDENCE-CALIBRATED ADVERSARIAL TRAINING and Detection: MORE ROBUST MODELS GENERALIZ-
Adversarial training is the standard to train models robust against adversarial examples. However, especially for complex datasets, adversarial training incurs a significant loss in accuracy and is
Interpolated Joint Space Adversarial Training for Robust and Generalizable Defenses
TLDR
The Robust Mixup strategy in which the authors maximize the adversity of the interpolated images and gain robustness and prevent overfitting is proposed, and IJSAT achieves good performance in standard accuracy, robustness, and generalization in CIFAR-10/100, OM-ImageNet, and CIFar-10-C datasets.
$\ell_\infty$-Robustness and Beyond: Unleashing Efficient Adversarial Training
TLDR
By leveraging the theory of coreset selection, it is shown how selecting a small subset of training data provides a more principled approach towards reducing the time complexity of robust training.
Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp Adversarial Attacks
TLDR
The proposed Dual Manifold Adversarial Training (DMAT) improves performance on normal images, and achieves comparable robustness to the standard adversarial training against Lp attacks, and models defended by DMAT achieve improved robustness against novel attacks which manipulate images by global color shifts or various types of image filtering.
Towards Defending Multiple Adversarial Perturbations via Gated Batch Normalization
TLDR
Gated Batch Normalization (GBN) is proposed, a novel building block for deep neural networks that improves robustness against multiple perturbation types and performs well on MNIST, CIFAR-10, and Tiny-ImageNet.
Self-Progressing Robust Training
TLDR
A new framework called SPROUT is proposed, self-progressing robust training, that progressively adjusts training label distribution via the authors' proposed parametrized label smoothing technique, making training free of attack generation and more scalable to large neural networks.
Mutual Adversarial Training: Learning together is better than going alone
TLDR
This paper proposes mutual adversarial training (MAT), in which multiple models are trained together and share the knowledge of adversarial examples to achieve improved robustness, and demonstrates that collaborative learning is an effective strategy for designing robust models.
Lagrangian Objective Function Leads to Improved Unforeseen Attack Generalization in Adversarial Training
TLDR
This paper proposes a simple modification to the AT that mitigates the perturbation `p norm while maximizing the classification loss in the Lagrangian form and argues that crafting adversarial examples based on this scheme results in enhanced attack generalization in the learned model.
Towards Transferable Adversarial Perturbations with Minimum Norm
Transfer-based adversarial example is one of the most important classes of black-box attacks. Prior work in this direction often requires a fixed but large perturbation radius to reach a good
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 62 REFERENCES
Adversarial Training and Robustness for Multiple Perturbations
TLDR
It is proved that a trade-off in robustness to different types of $\ell_p$-bounded and spatial perturbations must exist in a natural and simple statistical setting, and questioned the viability and computational scalability of extending adversarial robustness, and adversarial training, to multiple perturbation types.
Barrage of Random Transforms for Adversarially Robust Defense
TLDR
It is shown that, even after accounting for obfuscated gradients, the Barrage of Random Transforms (BaRT) is a resilient defense against even the most difficult attacks, such as PGD.
Exploiting Excessive Invariance caused by Norm-Bounded Adversarial Robustness
TLDR
This paper demonstrates that robustness to perturbation-based adversarial examples is not only insufficient for general robustness, but worse, it can also increase vulnerability of the model to invariance-based adversaries, and argues that the term adversarial example is used to capture a series of model limitations.
Towards Deep Learning Models Resistant to Adversarial Attacks
TLDR
This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee.
Adversarial Robustness through Local Linearization
TLDR
A novel regularizer is introduced that encourages the loss to behave linearly in the vicinity of the training data, thereby penalizing gradient obfuscation while encouraging robustness and shows via extensive experiments on CIFAR-10 and ImageNet, that models trained with this regularizer avoid gradient obfuscations and can be trained significantly faster than adversarial training.
Quantifying Perceptual Distortion of Adversarial Examples
TLDR
This work presents and employs a unifying framework fusing different attack styles to demonstrate the value of quantifying the perceptual distortion of adversarial examples, and performs adversarial training using attacks generated by the framework to demonstrate that networks are only robust to classes of adversarian perturbations they have been trained against.
Constructing Unrestricted Adversarial Examples with Generative Models
TLDR
The empirical results on the MNIST, SVHN, and CelebA datasets show that unrestricted adversarial examples can bypass strong adversarial training and certified defense methods designed for traditional adversarial attacks.
On Evaluating Adversarial Robustness
TLDR
The methodological foundations are discussed, commonly accepted best practices are reviewed, and new methods for evaluating defenses to adversarial examples are suggested.
Attacking the Madry Defense Model with L1-based Adversarial Examples
TLDR
The experimental results demonstrate that by relaxing the constraint of the competition, the elastic-net attack to deep neural networks (EAD) can generate transferable adversarial examples which, despite their high average $L_\infty$ distortion, have minimal visual distortion.
Adversarial Examples Are a Natural Consequence of Test Error in Noise
TLDR
It is suggested that improving adversarial robustness should go hand in hand with improving performance in the presence of more general and realistic image corruptions, and that future adversarial defenses consider evaluating the robustness of their methods to distributional shift with benchmarks such as Imagenet-C.
...
1
2
3
4
5
...