Adversarial Attacks on Deep-learning Models in Natural Language Processing

  title={Adversarial Attacks on Deep-learning Models in Natural Language Processing},
  author={W. Zhang and Quan Z. Sheng and Ahoud Abdulrahmn F. Alhazmi and Chenliang Li},
  journal={ACM Transactions on Intelligent Systems and Technology (TIST)},
  pages={1 - 41}
With the development of high computational devices, deep neural networks (DNNs), in recent years, have gained significant popularity in many Artificial Intelligence (AI) applications. [] Key Result Finally, drawing on the reviewed literature, we provide further discussions and suggestions on this topic.

Figures and Tables from this paper

Towards a Robust Deep Neural Network in Texts: A Survey

A taxonomy of adversarial attacks and defenses in texts from the perspective of different natural language processing (NLP) tasks is given, and how to build a robust DNN model via testing and verification is introduced.

Text Adversarial Attacks and Defenses: Issues, Taxonomy, and Perspectives

This work introduces the pipeline of NLP, including the vector representations of text, DNN-based victim models, and a formal definition of adversarial attacks, which makes the review self-contained.

A Differentiable Language Model Adversarial Attack on Text Classifiers

A new black-box sentence-level attack that fine-tunes a pre-trained language model to generate adversarial examples that outperforms competitors on a diverse set of NLP problems for both computed metrics and human evaluation.

Adversarially Robust and Explainable Model Compression with On-Device Personalization for Text Classification

This work designs a new training scheme, which builds the adversarial robustness and explainability in the authors' compressed RNN model during the training process via simultaneously optimizing the adversarially robust objective and the explainable feature mapping objective.

Adversarially robust and explainable model compression with on-device personalization for NLP applications

This work designs a new training scheme for model compression and adversarial robustness, including the optimization of an explainable feature mapping objective, a knowledge distillation objective, and an adversarially robustness objective.

Adversarial Training with Contrastive Learning in NLP

This work proposes adversarial training with contrastive learning (ATCL) to adversarially train a language processing task using the benefits of contrastive learn and shows not only an improvement in the quantitative scores when compared to the baselines, but ATCL also achieves good qualitative results in the semantic level for both tasks without using a pre-trained model.

An Attention Score Based Attacker for Black-box NLP Classifier

A word-level NLP sentiment classifier attack model, which includes a self-attention mechanism-based word selection method and a greedy search algorithm for word substitution, which achieves a higher attack success rate and more efficient than previous methods due to the efficient word selection algorithms employed and minimized the word substitute number.

Adversarial Examples for Chinese Text Classification

A marginal attack method is proposed to generate adversarial examples that could fool a variety of Chinese text classification DNNs, such that the text would be classified to incorrect category with high probability.

Detection of Adversarial Examples in Text Classification: Benchmark and Baseline via Robust Density Estimation

A dataset for four popular attack methods on four datasets and four models and a competitive baseline based on density estimation that has the highest auc on 29 out of 30 dataset-attack-model combinations is proposed.



Metamorphic Relation Based Adversarial Attacks on Differentiable Neural Computer

It is shown that the near-perfect performance of the DNC in bAbI logical question answering tasks can be degraded by adversarially injected sentences, and metamorphic relation based adversarial techniques for a range of tasks described in the natural processing language domain are proposed.

Generating Natural Adversarial Examples

This paper proposes a framework to generate natural and legible adversarial examples that lie on the data manifold, by searching in semantic space of dense and continuous data representation, utilizing the recent advances in generative adversarial networks.

Generating Natural Language Adversarial Examples

A black-box population-based optimization algorithm is used to generate semantically and syntactically similar adversarial examples that fool well-trained sentiment analysis and textual entailment models with success rates of 97% and 70%, respectively.

Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples

This paper proposes a projected gradient method combined with group lasso and gradient regularization for sequence-to-sequence (seq2seq) models, whose inputs are discrete text strings and outputs have an almost infinite number of possibilities.

The Limitations of Deep Learning in Adversarial Settings

This work formalizes the space of adversaries against deep neural networks (DNNs) and introduces a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs.

On Adversarial Examples for Character-Level Neural Machine Translation

This work investigates adversarial examples for character-level neural machine translation (NMT), and proposes two novel types of attacks which aim to remove or change a word in a translation, rather than simply break the NMT.

Adaptive Adversarial Attack on Scene Text Recognition

This work proposes an adaptive approach to speed up adversarial attacks, especially on sequential learning tasks, by leveraging the uncertainty of each task to directly learn the adaptive multi-task weightings, without manually searching hyper-parameters.

Adversarial Perturbations Against Deep Neural Networks for Malware Classification

This paper shows how to construct highly-effective adversarial sample crafting attacks for neural networks used as malware classifiers, and evaluates to which extent potential defensive mechanisms against adversarial crafting can be leveraged to the setting of malware classification.

TextBugger: Generating Adversarial Text Against Real-world Applications

This paper presents TextBugger, a general attack framework for generating adversarial texts, and empirically evaluates its effectiveness, evasiveness, and efficiency on a set of real-world DLTU systems and services used for sentiment analysis and toxic content detection.

Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers

A novel algorithm is presented, DeepWordBug, to effectively generate small text perturbations in a black-box setting that forces a deep-learning classifier to misclassify a text input.