Corpus ID: 159041363

What Do Adversarially Robust Models Look At?

  title={What Do Adversarially Robust Models Look At?},
  author={Takahiro Itazuri and Yoshihiro Fukuhara and Hirokatsu Kataoka and Shigeo Morishima},
In this paper, we address the open question: "What do adversarially robust models look at?" Recently, it has been reported in many works that there exists the trade-off between standard accuracy and adversarial robustness. According to prior works, this trade-off is rooted in the fact that adversarially robust and standard accurate models might depend on very different sets of features. However, it has not been well studied what kind of difference actually exists. In this paper, we analyze this… Expand
  • 2019
Recent studies show that convolutional neural networks (CNNs) are vulnerable under various settings, including adversarial examples, distribution shifting and backdoor attacks. Motivated by theExpand
Can Shape Structure Features Improve Model Robustness under Diverse Adversarial Settings?
  • Mingjie Sun, Zichao Li, +4 authors Bo Li
Recent studies show that convolutional neural networks (CNNs) are vulnerable under various settings, including adversarial attacks, common corruptions, and backdoor attacks. Motivated by the findingsExpand
Attention Meets Perturbations: Robust and Interpretable Attention With Adversarial Training
Evaluation experiments revealed that AT for attention mechanisms, especially Attention iAT, demonstrated the best performance in nine out of ten tasks and more interpretable attention for all tasks and the proposed techniques are much less dependent on perturbation size in AT. Expand


Adversarial Spheres
A fundamental tradeoff between the amount of test error and the average distance to nearest error is shown, which proves that any model which misclassifies a small constant fraction of a sphere will be vulnerable to adversarial perturbations of size O(1/\sqrt{d})$. Expand
On Detecting Adversarial Perturbations
It is shown empirically that adversarial perturbations can be detected surprisingly well even though they are quasi-imperceptible to humans. Expand
Robustness May Be at Odds with Accuracy
It is shown that there may exist an inherent tension between the goal of adversarial robustness and that of standard generalization, and it is argued that this phenomenon is a consequence of robust classifiers learning fundamentally different feature representations than standard classifiers. Expand
Towards Deep Learning Models Resistant to Adversarial Attacks
This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee. Expand
Interpretation of Neural Networks is Fragile
This paper systematically characterize the fragility of several widely-used feature-importance interpretation methods (saliency maps, relevance propagation, and DeepLIFT) on ImageNet and CIFAR-10 and extends these results to show that interpretations based on exemplars (e.g. influence functions) are similarly fragile. Expand
Robustness via Curvature Regularization, and Vice Versa
It is shown in particular that adversarial training leads to a significant decrease in the curvature of the loss surface with respect to inputs, leading to a drastically more "linear" behaviour of the network. Expand
Scaling provable adversarial defenses
This paper presents a technique for extending these training procedures to much more general networks, with skip connections and general nonlinearities, and shows how to further improve robust error through cascade models. Expand
Is Robustness the Cost of Accuracy? - A Comprehensive Study on the Robustness of 18 Deep Image Classification Models
This paper thoroughly benchmark 18 ImageNet models using multiple robustness metrics, including the distortion, success rate and transferability of adversarial examples between 306 pairs of models, and reveals several new insights. Expand
Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models
The proposed Defense-GAN, a new framework leveraging the expressive capability of generative models to defend deep neural networks against adversarial perturbations, is empirically shown to be consistently effective against different attack methods and improves on existing defense strategies. Expand
A Boundary Tilting Persepective on the Phenomenon of Adversarial Examples
It is shown that the adversarial strength observed in practice is directly dependent on the level of regularisation used and the strongest adversarial examples, symptomatic of overfitting, can be avoided by using a proper level ofRegularisation. Expand