• Corpus ID: 38134825

Towards Crafting Text Adversarial Samples

@article{Samanta2017TowardsCT,
  title={Towards Crafting Text Adversarial Samples},
  author={Suranjana Samanta and Sameep Mehta},
  journal={ArXiv},
  year={2017},
  volume={abs/1707.02812}
}
Adversarial samples are strategically modified samples, which are crafted with the purpose of fooling a classifier at hand. An attacker introduces specially crafted adversarial samples to a deployed classifier, which are being mis-classified by the classifier. However, the samples are perceived to be drawn from entirely different classes and thus it becomes hard to detect the adversarial samples. Most of the prior works have been focused on synthesizing adversarial samples in the image domain… 

Figures and Tables from this paper

Discrete Adversarial Attacks and Submodular Optimization with Applications to Text Classification

TLDR
It is proved that this set function is submodular for some popular neural network text classifiers under simplifying assumption, and guarantees a $1-1/e$ approximation factor for attacks that use the greedy algorithm.

Adversarial Texts with Gradient Methods

TLDR
This work proposes a framework to adapt the gradient attacking methods on images to text domain and successfully incorporates FGM and DeepFool into it, and empirically shows that WMD is closely related to the quality of adversarial texts.

Generating Textual Adversarial Examples for Deep Learning Models: A Survey

TLDR
This article reviews research works that address this difference and generate textual adversarial examples on DNNs and collects, select, summarize, discuss and analyze these works in a comprehensive way and cover all the related information to make the article self-contained.

Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers

TLDR
A novel algorithm is presented, DeepWordBug, to effectively generate small text perturbations in a black-box setting that forces a deep-learning classifier to misclassify a text input.

Discrete Attacks and Submodular Optimization with Applications to Text Classification

TLDR
It is proved that this set function is submodular for some popular neural network text classifiers under simplifying assumption, and guarantees a $1-1/e$ approximation factor for attacks that use the greedy algorithm.

Better constraints of imperceptibility, better adversarial examples in the text

TLDR
A stricter constraint for word‐level attacks to obtain more imperceptible samples is proposed and is also helpful to enhance existing word‐ level attacks for adversarial training.

TextBugger: Generating Adversarial Text Against Real-world Applications

TLDR
This paper presents TextBugger, a general attack framework for generating adversarial texts, and empirically evaluates its effectiveness, evasiveness, and efficiency on a set of real-world DLTU systems and services used for sentiment analysis and toxic content detection.

UGGER : Generating Adversarial Text Against Real-world Applications

TLDR
This paper presents TEXTBUGGER, a general attack framework for generating adversarial texts, and empirically evaluates its effectiveness, evasiveness, and efficiency on a set of real-world DLTU systems and services used for sentiment analysis and toxic content detection.

Textual Adversarial Attacking with Limited Queries

TLDR
A novel attack method is proposed, the main idea of which is to fully utilize the adversarial examples generated by the local model and transfer part of the attack to the localmodel to complete ahead of time, thereby reducing costs related to attacking the target model.
...

References

SHOWING 1-10 OF 16 REFERENCES

Deep Text Classification Can be Fooled

TLDR
An effective method to craft text adversarial samples that can successfully fool both state-of-the-art character-level and word-level DNN-based text classifiers and is difficult to be perceived.

Adversarial examples in the physical world

TLDR
It is found that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera, which shows that even in physical world scenarios, machine learning systems are vulnerable to adversarialExamples.

Adversarial Examples for Generative Models

TLDR
This work explores methods of producing adversarial examples on deep generative models such as the variational autoencoder (VAE) and the VAE-GAN and presents three classes of attacks, motivating why an attacker might be interested in deploying such techniques against a target generative network.

Adversarial Attacks on Neural Network Policies

TLDR
This work shows existing adversarial example crafting techniques can be used to significantly degrade test-time performance of trained policies, even with small adversarial perturbations that do not interfere with human perception.

Practical Black-Box Attacks against Machine Learning

TLDR
This work introduces the first practical demonstration of an attacker controlling a remotely hosted DNN with no such knowledge, and finds that this black-box attack strategy is capable of evading defense strategies previously found to make adversarial example crafting harder.

Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

TLDR
New transferability attacks between previously unexplored (substitute, victim) pairs of machine learning model classes, most notably SVMs and decision trees are introduced.

Explaining and Harnessing Adversarial Examples

TLDR
It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.

Deceiving Google's Perspective API Built for Detecting Toxic Comments

TLDR
It is shown that an adversary can subtly modify a highly toxic phrase in a way that the system assigns significantly lower toxicity score to it, and this attack can consistently reduce the toxicity scores to the level of the non-toxic phrases.

Intriguing properties of neural networks

TLDR
It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.

Character-level Convolutional Networks for Text Classification

TLDR
This article constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results in text classification.