Adversarial Attacks on Deep-learning Models in Natural Language Processing

@article{Zhang2020AdversarialAO,
  title={Adversarial Attacks on Deep-learning Models in Natural Language Processing},
  author={W. Zhang and Quan Z. Sheng and Ahoud Abdulrahmn F. Alhazmi and Chenliang Li},
  journal={ACM Transactions on Intelligent Systems and Technology (TIST)},
  year={2020},
  volume={11},
  pages={1 - 41}
}
With the development of high computational devices, deep neural networks (DNNs), in recent years, have gained significant popularity in many Artificial Intelligence (AI) applications. [...] Key Result Finally, drawing on the reviewed literature, we provide further discussions and suggestions on this topic.Expand
A Differentiable Language Model Adversarial Attack on Text Classifiers
TLDR
This paper fine-tunes a pre-trained language model to generate adversarial examples and proposes a new black-box sentence-level attack that outperforms competitors on a diverse set of NLP problems for both computed metrics and human evaluation. Expand
TextTricker: Loss-based and gradient-based adversarial attacks on text classification models
TLDR
A white-box adversarial attack algorithm, TextTricker, which supports both targeted and non-targeted attacks on text classification models and performs notably better than baselines in attack success rate is proposed. Expand
Adversarial Training with Contrastive Learning in NLP
TLDR
This work proposes adversarial training with contrastive learning (ATCL) to adversarially train a language processing task using the benefits of contrastive learn and shows not only an improvement in the quantitative scores when compared to the baselines, but ATCL also achieves good qualitative results in the semantic level for both tasks without using a pre-trained model. Expand
Unrestricted Adversarial Attacks on Vision Transformers
Recent advances in attention-based networks and following the success in advancing natural language processing and understanding have shown that Vision Transformers (ViTs) are expected to eventuallyExpand
Generating Natural Language Adversarial Examples on a Large Scale with Generative Models
TLDR
This paper proposes an end to end solution to efficiently generate adversarial texts from scratch using generative models, which are not restricted to perturbing the given texts, and calls it unrestricted adversarial text generation. Expand
On Adversarial Examples for Biomedical NLP Tasks
TLDR
This work proposes an adversarial evaluation scheme on two well-known datasets for medical NER and STS, and proposes two types of attacks inspired by natural spelling errors and typos made by humans that can improve the robustness of the models by training them with adversarial examples. Expand
GGT: Graph-Guided Testing for Adversarial Sample Detection of Deep Neural Network
TLDR
Graph-Guided Testing (GGT) is proposed for adversarial sample detection for DNN vulnerability detection and performs much better than Model Mutation Testing with respect to both effectiveness and efficiency. Expand
Generalization to Mitigate Synonym Substitution Attacks
TLDR
Results indicate that the proposed defense is not only capable of defending against adversarial attacks, but is also capable of improving the performance of DNN-based models when tested on benign data, and can improve the robustness of nonneural models. Expand
Adversarial Attacks and Defense on Texts: A Survey
TLDR
This manuscript accumulated and analyzed different attacking techniques and various defense models to provide a more comprehensive idea of how deep learning models possess weakness to noises which force the model to misclassify. Expand
TREATED: Towards Universal Defense against Textual Adversarial Attacks
  • Bin Zhu, Zhaoquan Gu, Le Wang, Zhihong Tian
  • Computer Science
  • ArXiv
  • 2021
TLDR
TREATED is proposed, a universal adversarial detection method that can defend against attacks of various perturbation levels without making any assumptions, and achieves better detection performance than baselines. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 259 REFERENCES
Metamorphic Relation Based Adversarial Attacks on Differentiable Neural Computer
TLDR
It is shown that the near-perfect performance of the DNC in bAbI logical question answering tasks can be degraded by adversarially injected sentences, and metamorphic relation based adversarial techniques for a range of tasks described in the natural processing language domain are proposed. Expand
Generating Natural Adversarial Examples
TLDR
This paper proposes a framework to generate natural and legible adversarial examples that lie on the data manifold, by searching in semantic space of dense and continuous data representation, utilizing the recent advances in generative adversarial networks. Expand
Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples
TLDR
A projected gradient method combined with group lasso and gradient regularization is proposed for crafting adversarial examples for sequence-to-sequence (seq2seq) models, whose inputs are discrete text strings and outputs have an almost infinite number of possibilities. Expand
Generating Natural Language Adversarial Examples
TLDR
A black-box population-based optimization algorithm is used to generate semantically and syntactically similar adversarial examples that fool well-trained sentiment analysis and textual entailment models with success rates of 97% and 70%, respectively. Expand
On Adversarial Examples for Character-Level Neural Machine Translation
TLDR
This work investigates adversarial examples for character-level neural machine translation (NMT), and proposes two novel types of attacks which aim to remove or change a word in a translation, rather than simply break the NMT. Expand
Adaptive Adversarial Attack on Scene Text Recognition
TLDR
This work proposes an adaptive approach to speed up adversarial attacks, especially on sequential learning tasks, by leveraging the uncertainty of each task to directly learn the adaptive multi-task weightings, without manually searching hyper-parameters. Expand
The Limitations of Deep Learning in Adversarial Settings
TLDR
This work formalizes the space of adversaries against deep neural networks (DNNs) and introduces a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs. Expand
Adversarial Perturbations Against Deep Neural Networks for Malware Classification
TLDR
This paper shows how to construct highly-effective adversarial sample crafting attacks for neural networks used as malware classifiers, and evaluates to which extent potential defensive mechanisms against adversarial crafting can be leveraged to the setting of malware classification. Expand
TextBugger: Generating Adversarial Text Against Real-world Applications
TLDR
This paper presents TextBugger, a general attack framework for generating adversarial texts, and empirically evaluates its effectiveness, evasiveness, and efficiency on a set of real-world DLTU systems and services used for sentiment analysis and toxic content detection. Expand
Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers
TLDR
A novel algorithm is presented, DeepWordBug, to effectively generate small text perturbations in a black-box setting that forces a deep-learning classifier to misclassify a text input. Expand
...
1
2
3
4
5
...