• Corpus ID: 235485405

Bad Characters: Imperceptible NLP Attacks

@article{Boucher2021BadCI,
  title={Bad Characters: Imperceptible NLP Attacks},
  author={Nicholas P. Boucher and Ilia Shumailov and Ross Anderson and Nicolas Papernot},
  journal={ArXiv},
  year={2021},
  volume={abs/2106.09898}
}
Several years of research have shown that machinelearning systems are vulnerable to adversarial examples, both in theory and in practice. Until now, such attacks have primarily targeted visual models, exploiting the gap between human and machine perception. Although text-based models have also been attacked with adversarial examples, such attacks struggled to preserve semantic meaning and indistinguishability. In this paper, we explore a large class of adversarial examples that can be used to… 
Adversarial Text Normalization
TLDR
The Adversarial Text Normalizer is proposed, a novel method that restores baseline performance on attacked content with low computational overhead and finds that text normalization provides a task-agnostic defense against character-level attacks that can be implemented supplementary to adversarial retraining solutions, which are more suited for semantic alterations.
Pipe Overflow: Smashing Voice Authentication for Fun and Profit
TLDR
It is demonstrated that for tasks like speaker identification, a human is capable of producing analog adversarial examples directly with little cost and supervision: by simply speaking through a tube, an adversary reliably impersonates other speakers in eyes of ML models for speaker identification.
Denial-of-Service Attack on Object Detection Model Using Universal Adversarial Perturbation
TLDR
NMS-Sponge is proposed, a novel approach that negatively affects the decision latency of YOLO, a state-ofthe-art object detector, and compromises the model’s availability by applying a universal adversarial perturbation (UAP).
Measure and Improve Robustness in NLP Models: A Survey
TLDR
This paper unifies various lines of work on identifying robustness failures and evaluating models’ robustness, and presents mitigation strategies that are data-driven, model- driven, and inductive-prior-based, with a more systematic view of how to effectively improve robustness in NLP models.
Trojan Source: Invisible Vulnerabilities
TLDR
This work presents a new type of attack in which source code is maliciously encoded so that it appears different to a compiler and to the human eye, and proposes definitive compiler-level defenses to block this attack.
Towards a Responsible AI Development Lifecycle: Lessons From Information Security
TLDR
This work proposes a framework for responsibly developing artificial intelligence systems by incorporating lessons from the field of information security and the secure development lifecycle to overcome challenges associated with protecting users in adversarial settings.
P OST C OG : A Tool for Interdisciplinary Research into Underground Forums at Scale
TLDR
P OST C OG is a web application developed to support users from both technical and non-technical backgrounds in forum analyses, such as search, information extraction and cross-forum comparison, and is made available for academic research upon signing an agreement with the Cambridge Cybercrime Centre.

References

SHOWING 1-10 OF 93 REFERENCES
Generating Natural Language Adversarial Examples
TLDR
A black-box population-based optimization algorithm is used to generate semantically and syntactically similar adversarial examples that fool well-trained sentiment analysis and textual entailment models with success rates of 97% and 70%, respectively.
Generating Natural Adversarial Examples
TLDR
This paper proposes a framework to generate natural and legible adversarial examples that lie on the data manifold, by searching in semantic space of dense and continuous data representation, utilizing the recent advances in generative adversarial networks.
TextBugger: Generating Adversarial Text Against Real-world Applications
TLDR
This paper presents TextBugger, a general attack framework for generating adversarial texts, and empirically evaluates its effectiveness, evasiveness, and efficiency on a set of real-world DLTU systems and services used for sentiment analysis and toxic content detection.
Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers
TLDR
A novel algorithm is presented, DeepWordBug, to effectively generate small text perturbations in a black-box setting that forces a deep-learning classifier to misclassify a text input.
A Reinforced Generation of Adversarial Examples for Neural Machine Translation
TLDR
The results show that the method efficiently produces stable attacks with meaning-preserving adversarial examples that could expose pitfalls for a given performance metric, e.g., BLEU, and could target any given neural machine translation architecture.
Certified Robustness to Adversarial Word Substitutions
TLDR
This paper trains the first models that are provably robust to all word substitutions in this exponentially large family of label-preserving transformations, in which every word in the input can be replaced with a similar word.
ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models
TLDR
An effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN is proposed, sparing the need for training substitute models and avoiding the loss in attack transferability.
Towards Deep Learning Models Resistant to Adversarial Attacks
TLDR
This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee.
TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP
TLDR
TextAttack, a Python framework for adversarial attacks, data augmentation, and adversarial training in NLP, is introduced and is democratizing NLP: anyone can tryData augmentation and adversaria training on any model or dataset, with just a few lines of code.
Practical Black-Box Attacks against Machine Learning
TLDR
This work introduces the first practical demonstration of an attacker controlling a remotely hosted DNN with no such knowledge, and finds that this black-box attack strategy is capable of evading defense strategies previously found to make adversarial example crafting harder.
...
...