Corpus ID: 235422571

Disrupting Model Training with Adversarial Shortcuts

  title={Disrupting Model Training with Adversarial Shortcuts},
  author={I. Evtimov and Ian Covert and Aditya Kusupati and T. Kohno},
When data is publicly released for human consumption, it is unclear how to prevent its unauthorized usage for machine learning purposes. Successful model training may be preventable with carefully designed dataset modifications, and we present a proof-of-concept approach for the image classification setting. We propose methods based on the notion of adversarial shortcuts, which encourage models to rely on non-robust signals rather than semantic features, and our experiments demonstrate that… Expand


Adversarial Examples Are Not Bugs, They Are Features
It is demonstrated that adversarial examples can be directly attributed to the presence of non-robust features: features derived from patterns in the data distribution that are highly predictive, yet brittle and incomprehensible to humans. Expand
Adversarial Examples Make Strong Poisons
The method, adversarial poisoning, is substantially more effective than existing poisoning methods for secure dataset release, and a poisoned version of ImageNet is released to encourage research into the strength of this form of data obfuscation. Expand
Poisoning Attacks with Generative Adversarial Nets
A novel generative model is introduced to craft systematic poisoning attacks against machine learning classifiers generating adversarial training examples, i.e. samples that look like genuine data points but that degrade the classifier's accuracy when used for training. Expand
Towards Deep Learning Models Resistant to Adversarial Attacks
This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee. Expand
Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching
This work analyzes an attack that successfully works against new, randomly initialized models, and is nearly imperceptible to humans, all while perturbing only a small fraction of the training data, concluding that data poisoning is a credible threat, even for large-scale deep learning systems. Expand
Support Vector Machines Under Adversarial Label Noise
This paper assumes that the adversary has control over some training data, and aims to subvert the SVM learning process, and proposes a strategy to improve the robustness of SVMs to training data manipulation based on a simple kernel matrix correction. Expand
Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization
This work proposes a novel poisoning algorithm based on the idea of back-gradient optimization, able to target a wider class of learning algorithms, trained with gradient-based procedures, including neural networks and deep learning architectures, and empirically evaluates its effectiveness on several application examples. Expand
Measuring the tendency of CNNs to Learn Surface Statistical Regularities
Deep CNNs are known to exhibit the following peculiarity: on the one hand they generalize extremely well to a test set, while on the other hand they are extremely sensitive to so-called adversarialExpand
An Attack on InstaHide: Is Private Learning Possible with Instance Encoding?
A reconstruction attack on InstaHide is presented that is able to use the encoded images to recover visually recognizable versions of the original images and proves barriers against achieving privacy through any learning protocol that uses instance encoding. Expand
Understanding deep learning requires rethinking generalization
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity. Expand