• Corpus ID: 229340618

On Success and Simplicity: A Second Look at Transferable Targeted Attacks

@article{Zhao2020OnSA,
  title={On Success and Simplicity: A Second Look at Transferable Targeted Attacks},
  author={Zhengyu Zhao and Zhuoran Liu and Martha Larson},
  journal={ArXiv},
  year={2020},
  volume={abs/2012.11207}
}
Achieving transferability of targeted attacks is reputed to be remarkably difficult. The current state of the art has resorted to resource-intensive solutions that necessitate training model(s) for each target class with additional data. In our investigation, we find, however, that simple transferable attacks which require neither model training nor additional data can achieve surprisingly strong targeted transferability. This insight has been overlooked until now, mainly because the widespread… 

Figures and Tables from this paper

A Little Robustness Goes a Long Way: Leveraging Universal Features for Targeted Transfer Attacks
TLDR
It is shown that training the source classifier to be “slightly robust”—that is, robust to small-magnitude adversarial examples—substantially improves the transferability of targeted attacks, even between architectures as different as convolutional neural networks and transformers.
Staircase Sign Method for Boosting Adversarial Attacks
TLDR
This work proposes a novel Staircase Sign Method (SM), which heuristically divides the gradient sign into several segments according to the values of the gradient units, and then assigns each segment with a staircase weight for better crafting adversarial perturbation.
A Little Robustness Goes a Long Way: Leveraging Robust Features for Targeted Transfer Attacks
TLDR
It is shown that training the source classifier to be “slightly robust”—that is, robust to small-magnitude adversarial examples—substantially improves the transferability of class-targeted and representation- targeted adversarial attacks, even between architectures as different as convolutional neural networks and transformers.
Can Targeted Adversarial Examples Transfer When the Source and Target Models Have No Label Space Overlap?
TLDR
It is found that it is indeed possible to construct targeted transfer-based adversarial attacks between models that have non-overlapping label spaces, and it is shown that these transfer attacks serve as powerful adversarial priors when integrated with query-based methods, markedly boosting query efficiency and adversarial success.
Evaluating Adversarial Attacks on ImageNet: A Reality Check on Misclassification Classes
TLDR
A detailed analysis of the nature of the classes into which adversarial examples are misclassified is performed, leveraging the ImageNet class hierarchy and measuring the relative positions of the aforementioned type of classes in the unperturbed origins of the adversarialExamples.
Evaluating Adversarial Attacks on ImageNet: A Reality Check on Misclassification Classes
Although ImageNet was initially proposed as a dataset for performance benchmarking in the domain of computer vision, it also enabled a variety of other research efforts. Adversarial machine learning

References

SHOWING 1-10 OF 62 REFERENCES
Towards Transferable Targeted Attack
TLDR
This paper introduces the Poincar\'{e} distance as the similarity metric to make the magnitude of gradient self-adaptive during iterative attack to alleviate noise curing and regularize the targeted attack process with metric learning to take adversarial examples away from true label and gain more transferable targeted adversarialExamples.
Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks
TLDR
This paper provides a unifying optimization framework for evasion and poisoning attacks, and a formal definition of transferability of such attacks, highlighting two main factors contributing to attack transferability: the intrinsic adversarial vulnerability of the target model, and the complexity of the surrogate model used to optimize the attack.
Delving into Transferable Adversarial Examples and Black-box Attacks
TLDR
This work is the first to conduct an extensive study of the transferability over large models and a large scale dataset, and it is also theFirst to study the transferabilities of targeted adversarial examples with their target labels.
Boosting the Transferability of Adversarial Samples via Attention
TLDR
This work proposes a novel mechanism that computes model attention over extracted features to regularize the search of adversarial examples, which prioritizes the corruption of critical features that are likely to be adopted by diverse architectures and can promote the transferability of resultant adversarial instances.
Improving Transferability of Adversarial Examples With Input Diversity
TLDR
This work proposes to improve the transferability of adversarial examples by creating diverse input patterns by applying random transformations to the input images at each iteration, and shows that the proposed attack method can generate adversarialExamples that transfer much better to different networks than existing baselines.
Perturbing Across the Feature Hierarchy to Improve Standard and Strict Blackbox Attack Transferability
TLDR
This work designs a flexible attack framework that allows for multi-layer perturbations and demonstrates state-of-the-art targeted transfer performance between ImageNet DNNs, and analyzes why the proposed methods outperform existing attack strategies.
Evading Defenses to Transferable Adversarial Examples by Translation-Invariant Attacks
TLDR
A translation-invariant attack method to generate more transferable adversarial examples against the defense models, which fools eight state-of-the-art defenses at an 82% success rate on average based only on the transferability, demonstrating the insecurity of the current defense techniques.
Double Targeted Universal Adversarial Perturbations
TLDR
A double targeted universal adversarial perturbations (DT-UAPs) are introduced to bridge the gap between the instance-discriminative image-dependent perturbATIONS and the generic universal perturbation to provide an attacker with the freedom to perform precise attacks on a DNN model while raising little suspicion.
Black-box Adversarial Attacks with Limited Queries and Information
TLDR
This work defines three realistic threat models that more accurately characterize many real-world classifiers: the query-limited setting, the partial-information setting, and the label-only setting and develops new attacks that fool classifiers under these more restrictive threat models.
Towards Deep Learning Models Resistant to Adversarial Attacks
TLDR
This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee.
...
1
2
3
4
5
...