Semantic-Preserving Adversarial Text Attacks

  title={Semantic-Preserving Adversarial Text Attacks},
  author={Xinghao Yang and Weifeng Liu and James Bailey and Tianqing Zhu and Dacheng Tao and Wei Liu},
Deep learning models are known immensely brittle to adversarial image examples, yet their vulnerability in text classification is insufficiently explored. Existing text adversarial attack strategies can be roughly divided into three categories, i.e., character-level attack, word-level attack, and sentence-level attack. Despite the success brought by recent text attack methods, how to induce misclassification with the minimal text modifications while keeping the lexical correctness, syntactic… 



Word-level Textual Adversarial Attacking as Combinatorial Optimization

A novel attack model, which incorporates the sememe-based word substitution method and particle swarm optimization-based search algorithm to solve the two problems separately is proposed, which consistently achieves much higher attack success rates and crafts more high-quality adversarial examples as compared to baseline methods.

Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency

A new word replacement order determined by both the wordsaliency and the classification probability is introduced, and a greedy algorithm called probability weighted word saliency (PWWS) is proposed for text adversarial attack.

Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment

TextFooler is presented, a simple but strong baseline to generate adversarial text that outperforms previous attacks by success rate and perturbation rate, and is utility-preserving and efficient, which generates adversarialtext with computational complexity linear to the text length.

Towards a Robust Deep Neural Network in Texts: A Survey

A taxonomy of adversarial attacks and defenses in texts from the perspective of different natural language processing (NLP) tasks is given, and how to build a robust DNN model via testing and verification is introduced.

Generating Textual Adversarial Examples for Deep Learning Models: A Survey

This article reviews research works that address this difference and generate textual adversarial examples on DNNs and collects, select, summarize, discuss and analyze these works in a comprehensive way and cover all the related information to make the article self-contained.

Generating Natural Language Adversarial Examples

A black-box population-based optimization algorithm is used to generate semantically and syntactically similar adversarial examples that fool well-trained sentiment analysis and textual entailment models with success rates of 97% and 70%, respectively.

BERT-ATTACK: Adversarial Attack against BERT Using BERT

This paper proposes a high-quality and effective method to generate adversarial samples using pre-trained masked language models exemplified by BERT against its fine-tuned models and other deep neural models for downstream tasks and successfully misleads the target models to predict incorrectly.

HotFlip: White-Box Adversarial Examples for Text Classification

An efficient method to generate white-box adversarial examples to trick a character-level neural classifier based on an atomic flip operation, which swaps one token for another, based on the gradients of the one-hot input vectors is proposed.

BAE: BERT-based Adversarial Examples for Text Classification

This work presents BAE, a powerful black box attack for generating grammatically correct and semantically coherent adversarial examples, and shows that BAE performs a stronger attack on three widely used models for seven text classification datasets.