Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer

  title={Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer},
  author={Fanchao Qi and Yangyi Chen and Xurui Zhang and Mukai Li and Zhiyuan Liu and Maosong Sun},
Adversarial attacks and backdoor attacks are two common security threats that hang over deep learning. Both of them harness taskirrelevant features of data in their implementation. Text style is a feature that is naturally irrelevant to most NLP tasks, and thus suitable for adversarial and backdoor attacks. In this paper, we make the first attempt to conduct adversarial and backdoor attacks based on text style transfer, which is aimed at altering the style of a sentence while preserving its… 

Figures and Tables from this paper

Textual Backdoor Attacks Can Be More Harmful via Two Simple Tricks
Two simple tricks are found that can make existing textual backdoor attacks much more harmful and can significantly improve attack performance.
Rethink Stealthy Backdoor Attacks in Natural Language Processing
  • Lingfeng Shen, Haiyun Jiang, Lemao Liu, Shuming Shi
  • Computer Science
  • 2022
Recently, it has been shown that natural language processing (NLP) models are vulnerable to a kind of security threat called the Backdoor Attack, which utilizes a ‘backdoor trigger’ paradigm to
Bridge the Gap Between CV and NLP! A Gradient-based Textual Adversarial Attack Framework
This paper instantiate their framework with Textual Projected Gradient Descent (TPGD), and conducts comprehensive experiments to evaluate the framework by performing transfer black-box attacks on BERT, RoBERTa and ALBERT on three benchmark datasets.


TextBugger: Generating Adversarial Text Against Real-world Applications
This paper presents TextBugger, a general attack framework for generating adversarial texts, and empirically evaluates its effectiveness, evasiveness, and efficiency on a set of real-world DLTU systems and services used for sentiment analysis and toxic content detection.
Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations
This work proposes a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently and can bring more robustness improvement to the victim model by adversarial training.
Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers
A novel algorithm is presented, DeepWordBug, to effectively generate small text perturbations in a black-box setting that forces a deep-learning classifier to misclassify a text input.
BadNL: Backdoor Attacks Against NLP Models
This paper presents the first systematic investigation of the backdoor attack against models designed for natural language processing (NLP) tasks, and proposes three methods to construct triggers in the NLP setting, including Char-level, Word- level, and Sentence-level triggers.
Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger
This paper conducts extensive experiments to demonstrate that the syntactic trigger-based attack method can achieve comparable attack performance to the insertionbased methods but possesses much higher invisibility and stronger resistance to defenses.
A Backdoor Attack Against LSTM-Based Text Classification Systems
A backdoor attack against LSTM-based text classification by data poisoning, where the adversary will inject backdoors into the model and then cause the misbehavior of the model through inputs including backdoor triggers.
Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution
It is shown that NLP models can be injected with backdoors that lead to a nearly 100% attack success rate, whereas being highly invisible to existing defense strategies and even human inspections raises a serious alarm to the security of N LP models.
Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment
TextFooler, a simple but strong baseline to generate natural adversarial text that outperforms state-of-the-art attacks in terms of success rate and perturbation rate, and is utility-preserving, which preserves semantic content and grammaticality and remains correctly classified by humans.
Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems
This work investigates the impact of visual adversarial attacks on current NLP systems on character-, word-, and sentence-level tasks, showing that both neural and non-neural models are, in contrast to humans, extremely sensitive to such attacks, suffering performance decreases of up to 82%.
CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial Text Generation
This work presents a Controlled Adversarial Text Generation (CAT-Gen) model that, given an input text, generates adversarial texts through controllable attributes that are known to be invariant to task labels.