Semantic-Preserving Adversarial Text Attacks
@article{Yang2021SemanticPreservingAT, title={Semantic-Preserving Adversarial Text Attacks}, author={Xinghao Yang and Weifeng Liu and James Bailey and Tianqing Zhu and Dacheng Tao and Wei Liu}, journal={ArXiv}, year={2021}, volume={abs/2108.10015} }
Deep learning models are known immensely brittle to adversarial image examples, yet their vulnerability in text classification is insufficiently explored. Existing text adversarial attack strategies can be roughly divided into three categories, i.e., character-level attack, word-level attack, and sentence-level attack. Despite the success brought by recent text attack methods, how to induce misclassification with the minimal text modifications while keeping the lexical correctness, syntactic…
Figures and Tables from this paper
References
SHOWING 1-10 OF 40 REFERENCES
Word-level Textual Adversarial Attacking as Combinatorial Optimization
- Computer ScienceACL
- 2020
A novel attack model, which incorporates the sememe-based word substitution method and particle swarm optimization-based search algorithm to solve the two problems separately is proposed, which consistently achieves much higher attack success rates and crafts more high-quality adversarial examples as compared to baseline methods.
Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency
- Computer ScienceACL
- 2019
A new word replacement order determined by both the wordsaliency and the classification probability is introduced, and a greedy algorithm called probability weighted word saliency (PWWS) is proposed for text adversarial attack.
Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment
- Computer ScienceAAAI
- 2020
TextFooler is presented, a simple but strong baseline to generate adversarial text that outperforms previous attacks by success rate and perturbation rate, and is utility-preserving and efficient, which generates adversarialtext with computational complexity linear to the text length.
Towards a Robust Deep Neural Network in Texts: A Survey
- Computer Science
- 2019
A taxonomy of adversarial attacks and defenses in texts from the perspective of different natural language processing (NLP) tasks is given, and how to build a robust DNN model via testing and verification is introduced.
Generating Textual Adversarial Examples for Deep Learning Models: A Survey
- Computer ScienceArXiv
- 2019
This article reviews research works that address this difference and generate textual adversarial examples on DNNs and collects, select, summarize, discuss and analyze these works in a comprehensive way and cover all the related information to make the article self-contained.
Generating Natural Language Adversarial Examples
- Computer ScienceEMNLP
- 2018
A black-box population-based optimization algorithm is used to generate semantically and syntactically similar adversarial examples that fool well-trained sentiment analysis and textual entailment models with success rates of 97% and 70%, respectively.
BERT-ATTACK: Adversarial Attack against BERT Using BERT
- Computer ScienceEMNLP
- 2020
This paper proposes a high-quality and effective method to generate adversarial samples using pre-trained masked language models exemplified by BERT against its fine-tuned models and other deep neural models for downstream tasks and successfully misleads the target models to predict incorrectly.
HotFlip: White-Box Adversarial Examples for Text Classification
- Computer ScienceACL
- 2018
An efficient method to generate white-box adversarial examples to trick a character-level neural classifier based on an atomic flip operation, which swaps one token for another, based on the gradients of the one-hot input vectors is proposed.
BAE: BERT-based Adversarial Examples for Text Classification
- Computer ScienceEMNLP
- 2020
This work presents BAE, a powerful black box attack for generating grammatically correct and semantically coherent adversarial examples, and shows that BAE performs a stronger attack on three widely used models for seven text classification datasets.