• Corpus ID: 235485405

Bad Characters: Imperceptible NLP Attacks

  title={Bad Characters: Imperceptible NLP Attacks},
  author={Nicholas P. Boucher and Ilia Shumailov and Ross Anderson and Nicolas Papernot},
Several years of research have shown that machinelearning systems are vulnerable to adversarial examples, both in theory and in practice. Until now, such attacks have primarily targeted visual models, exploiting the gap between human and machine perception. Although text-based models have also been attacked with adversarial examples, such attacks struggled to preserve semantic meaning and indistinguishability. In this paper, we explore a large class of adversarial examples that can be used to… 
Adversarial Text Normalization
The Adversarial Text Normalizer is proposed, a novel method that restores baseline performance on attacked content with low computational overhead and finds that text normalization provides a task-agnostic defense against character-level attacks that can be implemented supplementary to adversarial retraining solutions, which are more suited for semantic alterations.
Pipe Overflow: Smashing Voice Authentication for Fun and Profit
It is demonstrated that for tasks like speaker identification, a human is capable of producing analog adversarial examples directly with little cost and supervision: by simply speaking through a tube, an adversary reliably impersonates other speakers in eyes of ML models for speaker identification.
Denial-of-Service Attack on Object Detection Model Using Universal Adversarial Perturbation
NMS-Sponge is proposed, a novel approach that negatively affects the decision latency of YOLO, a state-ofthe-art object detector, and compromises the model’s availability by applying a universal adversarial perturbation (UAP).
Measure and Improve Robustness in NLP Models: A Survey
This paper unifies various lines of work on identifying robustness failures and evaluating models’ robustness, and presents mitigation strategies that are data-driven, model- driven, and inductive-prior-based, with a more systematic view of how to effectively improve robustness in NLP models.
Trojan Source: Invisible Vulnerabilities
This work presents a new type of attack in which source code is maliciously encoded so that it appears different to a compiler and to the human eye, and proposes definitive compiler-level defenses to block this attack.
Towards a Responsible AI Development Lifecycle: Lessons From Information Security
This work proposes a framework for responsibly developing artificial intelligence systems by incorporating lessons from the field of information security and the secure development lifecycle to overcome challenges associated with protecting users in adversarial settings.
P OST C OG : A Tool for Interdisciplinary Research into Underground Forums at Scale
P OST C OG is a web application developed to support users from both technical and non-technical backgrounds in forum analyses, such as search, information extraction and cross-forum comparison, and is made available for academic research upon signing an agreement with the Cambridge Cybercrime Centre.


Generating Natural Language Adversarial Examples
A black-box population-based optimization algorithm is used to generate semantically and syntactically similar adversarial examples that fool well-trained sentiment analysis and textual entailment models with success rates of 97% and 70%, respectively.
Generating Natural Adversarial Examples
This paper proposes a framework to generate natural and legible adversarial examples that lie on the data manifold, by searching in semantic space of dense and continuous data representation, utilizing the recent advances in generative adversarial networks.
TextBugger: Generating Adversarial Text Against Real-world Applications
This paper presents TextBugger, a general attack framework for generating adversarial texts, and empirically evaluates its effectiveness, evasiveness, and efficiency on a set of real-world DLTU systems and services used for sentiment analysis and toxic content detection.
Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers
A novel algorithm is presented, DeepWordBug, to effectively generate small text perturbations in a black-box setting that forces a deep-learning classifier to misclassify a text input.
A Reinforced Generation of Adversarial Examples for Neural Machine Translation
The results show that the method efficiently produces stable attacks with meaning-preserving adversarial examples that could expose pitfalls for a given performance metric, e.g., BLEU, and could target any given neural machine translation architecture.
Certified Robustness to Adversarial Word Substitutions
This paper trains the first models that are provably robust to all word substitutions in this exponentially large family of label-preserving transformations, in which every word in the input can be replaced with a similar word.
ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models
An effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN is proposed, sparing the need for training substitute models and avoiding the loss in attack transferability.
Towards Deep Learning Models Resistant to Adversarial Attacks
This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee.
TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP
TextAttack, a Python framework for adversarial attacks, data augmentation, and adversarial training in NLP, is introduced and is democratizing NLP: anyone can tryData augmentation and adversaria training on any model or dataset, with just a few lines of code.
Practical Black-Box Attacks against Machine Learning
This work introduces the first practical demonstration of an attacker controlling a remotely hosted DNN with no such knowledge, and finds that this black-box attack strategy is capable of evading defense strategies previously found to make adversarial example crafting harder.