Corpus ID: 219956260

Differentiable Language Model Adversarial Attacks on Categorical Sequence Classifiers

@article{Fursov2020DifferentiableLM,
  title={Differentiable Language Model Adversarial Attacks on Categorical Sequence Classifiers},
  author={I. Fursov and Alexey Zaytsev and Nikita Klyuchnikov and Andrey Kravchenko and Evgeny V. Burnaev},
  journal={ArXiv},
  year={2020},
  volume={abs/2006.11078}
}
An adversarial attack paradigm explores various scenarios for the vulnerability of deep learning models: minor changes of the input can force a model failure. Most of the state of the art frameworks focus on adversarial attacks for images and other structured model inputs, but not for categorical sequences models. Successful attacks on classifiers of categorical sequences are challenging because the model input is tokens from finite sets, so a classifier score is non-differentiable with… Expand
Adversarial Attacks on Deep Models for Financial Transaction Records
TLDR
This work shows that embedding protection from adversarial attacks improves model robustness, allowing a wider adoption of deep models for transaction records in banking and finance. Expand
Gradient-based adversarial attacks on categorical sequence models via traversing an embedded world
TLDR
Results for money transactions, medical fraud, and NLP datasets suggest that proposed methods generate reasonable adversarial sequences that are close to original ones but fool machine learning models. Expand

References

SHOWING 1-10 OF 44 REFERENCES
Adversarial Machine Learning at Scale
TLDR
This research applies adversarial training to ImageNet and finds that single-step attacks are the best for mounting black-box attacks, and resolution of a "label leaking" effect that causes adversarially trained models to perform better on adversarial examples than on clean examples. Expand
TextTricker: Loss-based and gradient-based adversarial attacks on text classification models
TLDR
A white-box adversarial attack algorithm, TextTricker, which supports both targeted and non-targeted attacks on text classification models and performs notably better than baselines in attack success rate is proposed. Expand
Generating Natural Language Adversarial Examples on a Large Scale with Generative Models
TLDR
This paper proposes an end to end solution to efficiently generate adversarial texts from scratch using generative models, which are not restricted to perturbing the given texts, and calls it unrestricted adversarial text generation. Expand
Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers
TLDR
A novel algorithm is presented, DeepWordBug, to effectively generate small text perturbations in a black-box setting that forces a deep-learning classifier to misclassify a text input. Expand
Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey
TLDR
This paper presents the first comprehensive survey on adversarial attacks on deep learning in computer vision, reviewing the works that design adversarial attack, analyze the existence of such attacks and propose defenses against them. Expand
Adversarial Examples: Attacks and Defenses for Deep Learning
TLDR
The methods for generating adversarial examples for DNNs are summarized, a taxonomy of these methods is proposed, and three major challenges in adversarialExamples are discussed and the potential solutions are discussed. Expand
Crafting adversarial input sequences for recurrent neural networks
TLDR
This paper investigates adversarial input sequences for recurrent neural networks processing sequential data and shows that the classes of algorithms introduced previously to craft adversarial samples misclassified by feed-forward neural networks can be adapted to recurrent Neural networks. Expand
Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples
TLDR
A projected gradient method combined with group lasso and gradient regularization is proposed for crafting adversarial examples for sequence-to-sequence (seq2seq) models, whose inputs are discrete text strings and outputs have an almost infinite number of possibilities. Expand
Adversarial Attacks on Neural Networks for Graph Data
TLDR
This work introduces the first study of adversarial attacks on attributed graphs, specifically focusing on models exploiting ideas of graph convolutions, and generates adversarial perturbations targeting the node's features and the graph structure, taking the dependencies between instances in account. Expand
Adversarial Attack and Defense on Graph Data: A Survey
TLDR
This work systemically organize the considered works based on the features of each topic and provides a unified formulation for adversarialLearning on graph data which covers most adversarial learning studies on graph. Expand
...
1
2
3
4
5
...