• Corpus ID: 34226122

DANCin SEQ2SEQ: Fooling Text Classifiers with Adversarial Text Example Generation

  title={DANCin SEQ2SEQ: Fooling Text Classifiers with Adversarial Text Example Generation},
  author={Catherine Wong},
Machine learning models are powerful but fallible. Generating adversarial examples - inputs deliberately crafted to cause model misclassification or other errors - can yield important insight into model assumptions and vulnerabilities. Despite significant recent work on adversarial example generation targeting image classifiers, relatively little work exists exploring adversarial example generation for text classifiers; additionally, many existing adversarial example generation algorithms… 

Figures from this paper

Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model

A reinforcement learning based approach towards generating adversarial examples in black-box settings that is able to fool well-trained models for IMDB sentiment classification task and AG's news corpus news categorization task with significantly high success rates.

Generating universal language adversarial examples by understanding and enhancing the transferability across neural models

This paper systematically study the transferability of adversarial attacks for text classification models and proposes universal black-box attack algorithms that can induce adversarial examples to attack almost all existing models.

On the Transferability of Adversarial Attacks against Neural Text Classifier

This paper presents the first study to systematically investigate the transferability of adversarial examples for text classification models and proposes a genetic algorithm to find an ensemble of models that can be used to induce adversarialExamples to fool almost all existing models.

Generating Textual Adversarial Examples for Deep Learning Models: A Survey

This article reviews research works that address this difference and generate textual adversarial examples on DNNs and collects, select, summarize, discuss and analyze these works in a comprehensive way and cover all the related information to make the article self-contained.

ReinforceBug: A Framework to Generate Adversarial Textual Examples

  • Bushra SabirM. BabarR. Gaire
  • Computer Science
    Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
  • 2021
ReinforceBug is a reinforcement learning framework that learns a policy that is transferable on unseen datasets and generates utility-preserving and transferable (on other models) AEs that preserve the functional equivalence and semantic similarity to their original counterparts.

R&R: Metric-guided Adversarial Sentence Generation

This paper proposes a rewrite and rollback (R&R) framework for adversarial attack that improves the quality of adversarial examples by optimizing a critique score which combines the fluency, similarity, and misclassification metrics.

Defense against Adversarial Attacks in NLP via Dirichlet Neighborhood Ensemble

Dirichlet Neighborhood Ensemble is proposed, a randomized smoothing method for training a robust model to defense substitution-based attacks that consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.

Evaluating and Enhancing the Robustness of Neural Network-based Dependency Parsing Models with Adversarial Examples

This study shows that adversarial examples also exist in dependency parsing, and proposes two approaches to study where and how parsers make mistakes by searching over perturbations to existing texts at sentence and phrase levels, and design algorithms to construct such examples in both of the black-box and white-box settings.

Adversarial Attacks on Deep-learning Models in Natural Language Processing

A systematic survey on preliminary knowledge of NLP and related seminal works in computer vision is presented, which collects all related academic works since the first appearance in 2017 and analyzes 40 representative works in a comprehensive way.

Proactive Detection of Query-based Adversarial Scenarios in NLP Systems

A robust, history-based model named Stateful Query Analysis (SQA) is proposed to identify suspiciously-similar sequences of queries capable of generating textual adversarial examples to which the authors refer by adversarial scenarios to take one step towards proactive detection of adversarial attacks in NLP systems.



Explaining and Harnessing Adversarial Examples

It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.

Adversarial Learning for Neural Dialogue Generation

This work applies adversarial training to open-domain dialogue generation, training a system to produce sequences that are indistinguishable from human-generated dialogue utterances, and investigates models for adversarial evaluation that uses success in fooling an adversary as a dialogue evaluation metric, while avoiding a number of potential pitfalls.

Exploring Adversarial Learning on Neural Network Models for Text Classification

A recurrent neural network with long short-term memory is trained by modifying its objective function to simulate training on adversarial examples, and various techniques in visualization are discussed and interpretations of perturbed sentences as well as perturbation as relations between words are offered.

Synthesizing Robust Adversarial Examples

The existence of robust 3D adversarial objects is demonstrated, and the first algorithm for synthesizing examples that are adversarial over a chosen distribution of transformations is presented, which synthesizes two-dimensional adversarial images that are robust to noise, distortion, and affine transformation.

Practical Black-Box Attacks against Machine Learning

This work introduces the first practical demonstration of an attacker controlling a remotely hosted DNN with no such knowledge, and finds that this black-box attack strategy is capable of evading defense strategies previously found to make adversarial example crafting harder.

Adversarial Examples for Evaluating Reading Comprehension Systems

This work proposes an adversarial evaluation scheme for the Stanford Question Answering Dataset that tests whether systems can answer questions about paragraphs that contain adversarially inserted sentences without changing the correct answer or misleading humans.

Intriguing properties of neural networks

It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.