Generating Natural Language Adversarial Examples

@inproceedings{Alzantot2018GeneratingNL,
  title={Generating Natural Language Adversarial Examples},
  author={Moustafa Farid Alzantot and Yash Sharma and Ahmed Elgohary and Bo-Jhang Ho and Mani B. Srivastava and Kai-Wei Chang},
  booktitle={EMNLP},
  year={2018}
}
Deep neural networks (DNNs) are vulnerable to adversarial examples, perturbations to correctly classified examples which can cause the model to misclassify. [] Key Method Given these challenges, we use a black-box population-based optimization algorithm to generate semantically and syntactically similar adversarial examples that fool well-trained sentiment analysis and textual entailment models with success rates of 97% and 70%, respectively. We additionally demonstrate that 92.3% of the successful sentiment…

Tables from this paper

Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification
TLDR
This work reports on crowdsourcing studies in which humans are tasked with iteratively modifying words in an input text, while receiving immediate model feedback, with the aim of causing a sentiment classification model to misclassify the example.
Universal Adversarial Attack via Conditional Sampling for Text Classification
TLDR
A novel method, based on conditional BERT sampling with multiple standards, for generating universal adversarial perturbations: input-agnostic of words that can be concatenated to any input in order to produce a specific prediction.
AdvExpander: Generating Natural Language Adversarial Examples by Expanding Text
TLDR
This paper presents a meta-modelling system that automates the very labor-intensive and therefore time-heavy and expensive and expensive process of manually cataloging and cataloging individual neurons in the brain.
Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model
TLDR
A reinforcement learning based approach towards generating adversarial examples in black-box settings that is able to fool well-trained models for IMDB sentiment classification task and AG's news corpus news categorization task with significantly high success rates.
Detecting Textual Adversarial Examples Based on Distributional Characteristics of Data Representations
TLDR
Two new reactive methods for NLP to fill the gap of effective general reactive approaches to defence via detection of textual adversarial examples such as is found in the image processing literature are proposed.
Generating universal language adversarial examples by understanding and enhancing the transferability across neural models
TLDR
This paper systematically study the transferability of adversarial attacks for text classification models and proposes universal black-box attack algorithms that can induce adversarial examples to attack almost all existing models.
Adversarial Examples with Difficult Common Words for Paraphrase Identification
TLDR
A novel algorithm is proposed to generate a new type of adversarial examples to study the robustness of deep paraphrase identification models and it is shown that adversarial training with generated adversarialExamples can improve model robustness.
Textual Adversarial Attacking with Limited Queries
TLDR
A novel attack method is proposed, the main idea of which is to fully utilize the adversarial examples generated by the local model and transfer part of the attack to the localmodel to complete ahead of time, thereby reducing costs related to attacking the target model.
Is BERT Really Robust? Natural Language Attack on Text Classification and Entailment
TLDR
The TextFooler is presented, a general attack framework, to generate natural adversarial texts that outperforms state-of-the-art attacks in terms of success rate and perturbation rate.
The Defense of Adversarial Example with Conditional Generative Adversarial Networks
TLDR
An image-to-image translation model to defend against adversarial examples based on a conditional generative adversarial network which consists of a generator and a discriminator and can map the adversarial images to the clean images, which are then fed to the target deep learning model.
...
...

References

SHOWING 1-10 OF 28 REFERENCES
Generating Natural Adversarial Examples
TLDR
This paper proposes a framework to generate natural and legible adversarial examples that lie on the data manifold, by searching in semantic space of dense and continuous data representation, utilizing the recent advances in generative adversarial networks.
Explaining and Harnessing Adversarial Examples
TLDR
It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.
EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples
TLDR
The authors' elastic-net attacks to DNNs (EAD) feature L1-oriented adversarial examples and include the state-of-the-art L2 attack as a special case, suggesting novel insights on leveraging L1 distortion in adversarial machine learning and security implications ofDNNs.
Towards Deep Learning Models Resistant to Adversarial Attacks
TLDR
This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee.
Semantically Equivalent Adversarial Rules for Debugging NLP models
TLDR
This work presents semantically equivalent adversaries (SEAs) – semantic-preserving perturbations that induce changes in the model’s predictions that induce adversaries on many instances that are extremely similar semantically.
GenAttack: practical black-box attacks with gradient-free optimization
TLDR
GenAttack is introduced, a gradient-free optimization technique that uses genetic algorithms for synthesizing adversarial examples in the black-box setting and can successfully attack some state-of-the-art ImageNet defenses, including ensemble adversarial training and non-differentiable or randomized input transformations.
ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models
TLDR
An effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN is proposed, sparing the need for training substitute models and avoiding the loss in attack transferability.
Adversarial Example Generation with Syntactically Controlled Paraphrase Networks
TLDR
A combination of automated and human evaluations show that SCPNs generate paraphrases that follow their target specifications without decreasing paraphrase quality when compared to baseline (uncontrolled) paraphrase systems.
Towards Evaluating the Robustness of Neural Networks
TLDR
It is demonstrated that defensive distillation does not significantly increase the robustness of neural networks, and three new attack algorithms are introduced that are successful on both distilled and undistilled neural networks with 100% probability are introduced.
...
...