Corpus ID: 209444890

AdvCodec: Towards A Unified Framework for Adversarial Text Generation

  title={AdvCodec: Towards A Unified Framework for Adversarial Text Generation},
  author={Boxin Wang and Hengzhi Pei and Han Liu and Bo Li},
While there has been great interest in generating imperceptible adversarial examples in continuous data domain (e.g. image and audio) to explore the model vulnerabilities, generating \emph{adversarial text} in the discrete domain is still challenging. The main contribution of this paper is to propose a general targeted attack framework AdvCodec for adversarial text generation which addresses the challenge of discrete input space and is easily adapted to general natural language processing (NLP… Expand
Exploring TEXTFOOLER’s Syntactically- and Semantically-Sound Distributed Adversarial Attack Methodology
TEXTFOOLER[1] is a recent adversarial attack model that establishes itself to be a potentially devastating method against a diverse set of NLP models, including the well-acclaimed BERT[2] model.Expand
Adversarial Attacks and Defenses on Cyber–Physical Systems: A Survey
A general working flow for adversarial attacks on CPSs is introduced and a clear taxonomy is provided to organize existing attacks effectively and indicate where the defenses can be potentially performed in CPSs as well. Expand
Contextualized Perturbation for Textual Adversarial Attack
CLARE is a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs through a mask-then-infill procedure that can flexibly combine and apply perturbations at any position in the inputs, and is thus able to attack the victim model more effectively with fewer edits. Expand
Robustness to Programmable String Transformations via Augmented Abstract Training
This paper shows how to decompose a set of user-defined string transformations into two component specifications, one that benefits from search and another from abstraction, and uses this technique to train models that are robust to combinations ofuser-defined transformations mimicking spelling mistakes and other meaning-preserving transformations. Expand
Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for Automatic Dialog Evaluation
This work proposes an automatic evaluation model CMADE (Comparison Model for Automatic Dialog Evaluation) that automatically cleans self-reported user ratings as it trains on them and first uses a self-supervised method to learn better dialog feature representation, and uses KNN and Shapley to remove confusing samples. Expand


TextBugger: Generating Adversarial Text Against Real-world Applications
This paper presents TextBugger, a general attack framework for generating adversarial texts, and empirically evaluates its effectiveness, evasiveness, and efficiency on a set of real-world DLTU systems and services used for sentiment analysis and toxic content detection. Expand
Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples
A projected gradient method combined with group lasso and gradient regularization is proposed for crafting adversarial examples for sequence-to-sequence (seq2seq) models, whose inputs are discrete text strings and outputs have an almost infinite number of possibilities. Expand
Discrete Adversarial Attacks and Submodular Optimization with Applications to Text Classification
It is proved that this set function is submodular for some popular neural network text classifiers under simplifying assumption, and guarantees a $1-1/e$ approximation factor for attacks that use the greedy algorithm. Expand
Towards Crafting Text Adversarial Samples
This paper proposes a new method of crafting adversarial text samples by modification of the original samples, which works best for the datasets which have sub-categories within each of the classes of examples. Expand
Generating Natural Adversarial Examples
This paper proposes a framework to generate natural and legible adversarial examples that lie on the data manifold, by searching in semantic space of dense and continuous data representation, utilizing the recent advances in generative adversarial networks. Expand
Deep Text Classification Can be Fooled
An effective method to craft text adversarial samples that can successfully fool both state-of-the-art character-level and word-level DNN-based text classifiers and is difficult to be perceived. Expand
Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment
TextFooler, a simple but strong baseline to generate natural adversarial text that outperforms state-of-the-art attacks in terms of success rate and perturbation rate, and is utility-preserving, which preserves semantic content and grammaticality and remains correctly classified by humans. Expand
Generating Natural Language Adversarial Examples
A black-box population-based optimization algorithm is used to generate semantically and syntactically similar adversarial examples that fool well-trained sentiment analysis and textual entailment models with success rates of 97% and 70%, respectively. Expand
Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation
This work studies text classification under synonym replacements or character flip perturbations and modifies the conventional log-likelihood training objective to train models that can be efficiently verified, which would otherwise come with exponential search complexity. Expand
Adversarial Example Generation with Syntactically Controlled Paraphrase Networks
A combination of automated and human evaluations show that SCPNs generate paraphrases that follow their target specifications without decreasing paraphrase quality when compared to baseline (uncontrolled) paraphrase systems. Expand