SemAttack: Natural Textual Attacks via Different Semantic Spaces

@article{Wang2022SemAttackNT,
  title={SemAttack: Natural Textual Attacks via Different Semantic Spaces},
  author={Boxin Wang and Chejian Xu and Xiangyu Liu and Yuk-Kit Cheng and Bo Li},
  journal={ArXiv},
  year={2022},
  volume={abs/2205.01287}
}
Recent studies show that pre-trained language 001 models (LMs) are vulnerable to textual adver- 002 sarial attacks. However, existing attack meth- 003 ods either suffer from low attack success rates 004 or fail to search efficiently in the exponentially 005 large perturbation space. We propose an effi- 006 cient and effective framework SemAttack to 007 generate natural adversarial text by construct- 008 ing different semantic perturbation functions. 009 In particular, SemAttack optimizes the gen… 

References

SHOWING 1-10 OF 44 REFERENCES
Contextualized Perturbation for Textual Adversarial Attack
TLDR
CLARE is a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs through a mask-then-infill procedure that can flexibly combine and apply perturbations at any position in the inputs, and is thus able to attack the victim model more effectively with fewer edits.
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models
TLDR
This work systematically applies 14 textual adversarial attack methods to GLUE tasks to construct AdvGLUE, a new multi-task benchmark to quantitatively and thoroughly explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
TextBugger: Generating Adversarial Text Against Real-world Applications
TLDR
This paper presents TextBugger, a general attack framework for generating adversarial texts, and empirically evaluates its effectiveness, evasiveness, and efficiency on a set of real-world DLTU systems and services used for sentiment analysis and toxic content detection.
FreeLB: Enhanced Adversarial Training for Natural Language Understanding
TLDR
A novel adversarial training algorithm is proposed, FreeLB, that promotes higher invariance in the embedding space, by adding adversarial perturbations to word embeddings and minimizing the resultant adversarial risk inside different regions around input samples.
Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment
TLDR
TextFooler is presented, a simple but strong baseline to generate adversarial text that outperforms previous attacks by success rate and perturbation rate, and is utility-preserving and efficient, which generates adversarialtext with computational complexity linear to the text length.
Word-level Textual Adversarial Attacking as Combinatorial Optimization
TLDR
A novel attack model, which incorporates the sememe-based word substitution method and particle swarm optimization-based search algorithm to solve the two problems separately is proposed, which consistently achieves much higher attack success rates and crafts more high-quality adversarial examples as compared to baseline methods.
Generating Natural Language Adversarial Examples
TLDR
A black-box population-based optimization algorithm is used to generate semantically and syntactically similar adversarial examples that fool well-trained sentiment analysis and textual entailment models with success rates of 97% and 70%, respectively.
Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples
TLDR
This paper proposes a projected gradient method combined with group lasso and gradient regularization for sequence-to-sequence (seq2seq) models, whose inputs are discrete text strings and outputs have an almost infinite number of possibilities.
BERT-ATTACK: Adversarial Attack against BERT Using BERT
TLDR
This paper proposes a high-quality and effective method to generate adversarial samples using pre-trained masked language models exemplified by BERT against its fine-tuned models and other deep neural models for downstream tasks and successfully misleads the target models to predict incorrectly.
Robust Encodings: A Framework for Combating Adversarial Typos
TLDR
This work introduces robust encodings (RobEn), a simple framework that confers guaranteed robustness, without making compromises on model architecture, and instantiates RobEn to defend against a large family of adversarial typos.
...
...