Generating Natural Language Adversarial Examples

@inproceedings{Alzantot2018GeneratingNL,
  title={Generating Natural Language Adversarial Examples},
  author={Moustafa Farid Alzantot and Yash Sharma and Ahmed Elgohary and Bo-Jhang Ho and Mani B. Srivastava and Kai-Wei Chang},
  booktitle={EMNLP},
  year={2018}
}
Deep neural networks (DNNs) are vulnerable to adversarial examples, perturbations to correctly classified examples which can cause the model to misclassify. [] Key Method Given these challenges, we use a black-box population-based optimization algorithm to generate semantically and syntactically similar adversarial examples that fool well-trained sentiment analysis and textual entailment models with success rates of 97% and 70%, respectively. We additionally demonstrate that 92.3% of the successful sentiment…

Tables from this paper

Generating Natural Language Adversarial Examples on a Large Scale with Generative Models
TLDR
This paper proposes an end to end solution to efficiently generate adversarial texts from scratch using generative models, which are not restricted to perturbing the given texts, and calls it unrestricted adversarial text generation.
Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification
TLDR
This work reports on crowdsourcing studies in which humans are tasked with iteratively modifying words in an input text, while receiving immediate model feedback, with the aim of causing a sentiment classification model to misclassify the example.
Universal Adversarial Attack via Conditional Sampling for Text Classification
TLDR
A novel method, based on conditional BERT sampling with multiple standards, for generating universal adversarial perturbations: input-agnostic of words that can be concatenated to any input in order to produce a specific prediction.
AdvExpander: Generating Natural Language Adversarial Examples by Expanding Text
TLDR
This paper presents a meta-modelling system that automates the very labor-intensive and therefore time-heavy and expensive and expensive process of manually cataloging and cataloging individual neurons in the brain.
Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model
TLDR
A reinforcement learning based approach towards generating adversarial examples in black-box settings that is able to fool well-trained models for IMDB sentiment classification task and AG's news corpus news categorization task with significantly high success rates.
Detecting Textual Adversarial Examples Based on Distributional Characteristics of Data Representations
TLDR
Two new reactive methods for NLP to fill the gap of effective general reactive approaches to defence via detection of textual adversarial examples such as is found in the image processing literature are proposed.
Generating universal language adversarial examples by understanding and enhancing the transferability across neural models
TLDR
This paper systematically study the transferability of adversarial attacks for text classification models and proposes universal black-box attack algorithms that can induce adversarial examples to attack almost all existing models.
Adversarial Examples with Difficult Common Words for Paraphrase Identification
TLDR
A novel algorithm is proposed to generate a new type of adversarial examples to study the robustness of deep paraphrase identification models and it is shown that adversarial training with generated adversarialExamples can improve model robustness.
Textual Adversarial Attacking with Limited Queries
TLDR
A novel attack method is proposed, the main idea of which is to fully utilize the adversarial examples generated by the local model and transfer part of the attack to the localmodel to complete ahead of time, thereby reducing costs related to attacking the target model.
Is BERT Really Robust? Natural Language Attack on Text Classification and Entailment
TLDR
The TextFooler is presented, a general attack framework, to generate natural adversarial texts that outperforms state-of-the-art attacks in terms of success rate and perturbation rate.
...
...

References

SHOWING 1-10 OF 28 REFERENCES
Generating Natural Adversarial Examples
TLDR
This paper proposes a framework to generate natural and legible adversarial examples that lie on the data manifold, by searching in semantic space of dense and continuous data representation, utilizing the recent advances in generative adversarial networks.
Explaining and Harnessing Adversarial Examples
TLDR
It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.
EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples
TLDR
The authors' elastic-net attacks to DNNs (EAD) feature L1-oriented adversarial examples and include the state-of-the-art L2 attack as a special case, suggesting novel insights on leveraging L1 distortion in adversarial machine learning and security implications ofDNNs.
Towards Deep Learning Models Resistant to Adversarial Attacks
TLDR
This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee.
Adversarial Machine Learning at Scale
TLDR
This research applies adversarial training to ImageNet and finds that single-step attacks are the best for mounting black-box attacks, and resolution of a "label leaking" effect that causes adversarially trained models to perform better on adversarial examples than on clean examples.
Semantically Equivalent Adversarial Rules for Debugging NLP models
TLDR
This work presents semantically equivalent adversaries (SEAs) – semantic-preserving perturbations that induce changes in the model’s predictions that induce adversaries on many instances that are extremely similar semantically.
GenAttack: practical black-box attacks with gradient-free optimization
TLDR
GenAttack is introduced, a gradient-free optimization technique that uses genetic algorithms for synthesizing adversarial examples in the black-box setting and can successfully attack some state-of-the-art ImageNet defenses, including ensemble adversarial training and non-differentiable or randomized input transformations.
ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models
TLDR
An effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN is proposed, sparing the need for training substitute models and avoiding the loss in attack transferability.
Adversarial Example Generation with Syntactically Controlled Paraphrase Networks
TLDR
A combination of automated and human evaluations show that SCPNs generate paraphrases that follow their target specifications without decreasing paraphrase quality when compared to baseline (uncontrolled) paraphrase systems.
...
...