• Corpus ID: 238857225

# DI-AA: An Interpretable White-box Attack for Fooling Deep Neural Networks

@article{Wang2021DIAAAI,
title={DI-AA: An Interpretable White-box Attack for Fooling Deep Neural Networks},
author={Yixiang Wang and Jiqiang Liu and Xiaolin Chang and Jianhua Wang and Ricardo J. Rodr'iguez},
journal={ArXiv},
year={2021},
volume={abs/2110.07305}
}
White-box Adversarial Example (AE) attacks towards Deep Neural Networks (DNNs) have a more powerful destructive capacity than black-box AE attacks in the fields of AE strategies. However, almost all the white-box approaches lack interpretation from the point of view of DNNs. That is, adversaries did not investigate the attacks from the perspective of interpretable features, and few of these approaches considered what features the DNN actually learns. In this paper, we propose an interpretable…

## References

SHOWING 1-10 OF 46 REFERENCES
ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models
• Computer Science, Mathematics
AISec@CCS
• 2017
An effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN is proposed, sparing the need for training substitute models and avoiding the loss in attack transferability.
IWA: Integrated Gradient based White-box Attacks for Fooling Deep Neural Networks
• Computer Science
International Journal of Intelligent Systems
• 2021
This paper proposes two Integrated gradient based White-box Adversarial example generation algorithms (IWA): IFPA and IUA, and verifies the effectiveness of the proposed algorithms on both structured and unstructured datasets, and compares them with five baseline generation algorithms.
• Computer Science, Mathematics
ICLR
• 2020
NI-FGSM and SIM can be naturally integrated to build a robust gradient-based attack to generate more transferable adversarial examples against the defense models and demonstrate that the attack methods exhibit higher transferability and achieve higher attack success rates than state-of-the-art gradient- based attacks.
Boosting the Transferability of Adversarial Samples via Attention
• Weibin Wu, +4 authors Yu-Wing Tai
• Computer Science
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
• 2020
This work proposes a novel mechanism that computes model attention over extracted features to regularize the search of adversarial examples, which prioritizes the corruption of critical features that are likely to be adopted by diverse architectures and can promote the transferability of resultant adversarial instances.
Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples
• Computer Science, Mathematics
ArXiv
• 2017
This work aims to increase the interpretability of DNNs on the whole image space by reducing the ambiguity of neurons by proposing a metric to evaluate the consistency level of neurons in a network quantitatively.
Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization
• Computer Science, Mathematics
ICML
• 2019
This work proposes an efficient discrete surrogate to the optimization problem which does not require estimating the gradient and consequently becomes free of the first order update hyperparameters to tune.
• Yinpeng Dong, +4 authors Jianguo Li
• Computer Science
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
• 2018
A broad class of momentum-based iterative algorithms to boost adversarial attacks by integrating the momentum term into the iterative process for attacks, which can stabilize update directions and escape from poor local maxima during the iterations, resulting in more transferable adversarial examples.
Towards Deep Learning Models Resistant to Adversarial Attacks
• Computer Science, Mathematics
ICLR
• 2018
This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee.
Towards Evaluating the Robustness of Neural Networks
• Computer Science
2017 IEEE Symposium on Security and Privacy (SP)
• 2017
It is demonstrated that defensive distillation does not significantly increase the robustness of neural networks, and three new attack algorithms are introduced that are successful on both distilled and undistilled neural networks with 100% probability are introduced.
A Frank-Wolfe Framework for Efficient and Effective Adversarial Attacks
• Computer Science, Mathematics
AAAI
• 2020
This paper proposes a novel adversarial attack framework for both white-box and black-box settings based on a variant of Frank-Wolfe algorithm, and shows in theory that the proposed attack algorithms are efficient with an $O(1/\sqrt{T})$ convergence rate.