Corpus ID: 6706414

Explaining and Harnessing Adversarial Examples

@article{Goodfellow2015ExplainingAH,
  title={Explaining and Harnessing Adversarial Examples},
  author={Ian J. Goodfellow and Jonathon Shlens and Christian Szegedy},
  journal={CoRR},
  year={2015},
  volume={abs/1412.6572}
}
Several machine learning models, including neural networks, consistently misclassify adversarial examples---inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed input results in the model outputting an incorrect answer with high confidence. Early attempts at explaining this phenomenon focused on nonlinearity and overfitting. We argue instead that the primary cause of neural networks' vulnerability to adversarial… Expand
Unsupervised Detection of Adversarial Examples with Model Explanations
TLDR
This work proposes a simple yet effective method to detect adversarial examples, using methods developed to explain the model’s behavior, and is the first in suggesting unsupervised defense method using model explanations. Expand
Predicting Adversarial Examples with High Confidence
TLDR
This work links robustness with non-calibrated model confidence on noisy images, providing a data-augmentation-free path forward, as an overly confident model is more likely to be vulnerable to adversarial examples. Expand
Hitting Depth : Investigating Robustness to Adversarial Examples in Deep Convolutional Neural Networks
Machine learning models, including Convolutional Neural Networks (CNN) are susceptible to adversarial examples input images that have been perturbed to deliberately fool a model into an incorrect,Expand
Intriguing Properties of Adversarial Examples
TLDR
This work argues that the origin of adversarial examples is primarily due to an inherent uncertainty that neural networks have about their predictions, and shows that the functional form of this uncertainty is independent of architecture, dataset, and training protocol; and depends only on the statistics of the logit differences of the network. Expand
Deep neural rejection against adversarial examples
TLDR
This work proposes a deep neural rejection mechanism to detect adversarial examples, based on the idea of rejecting samples that exhibit anomalous feature representations at different network layers, and empirically shows that this approach outperforms previously proposed methods that detect adversarian examples by only analyzing the feature representation provided by the output network layer. Expand
Vulnerability of classifiers to evolutionary generated adversarial examples
TLDR
An evolutionary algorithm is proposed that can generate adversarial examples for any machine learning model in the black-box attack scenario and can be found without access to model's parameters, only by querying the model at hand. Expand
Are Accuracy and Robustness Correlated
TLDR
It is demonstrated that better machine learning models are less vulnerable to adversarial examples, and cross-model adversarial portability is found to be mostly transferable across similar network topologies. Expand
The Dimpled Manifold Model of Adversarial Examples in Machine Learning
TLDR
A new conceptual framework is introduced which provides a simple explanation for why adversarial examples exist, why their perturbation have such tiny norms, why these perturbations look like random noise, and why a network which was adversarially trained with incorrectly labeled images can still correctly classify test images. Expand
Harnessing Model Uncertainty for Detecting Adversarial Examples
Deep Learning models are vulnerable to adversarial examples, i.e. images obtained via deliberate imperceptible perturbations, such that the model misclassifies them with high confidence. However,Expand
Learning Universal Adversarial Perturbations with Generative Models
TLDR
This work introduces universal adversarial networks, a generative network that is capable of fooling a target classifier when it's generated output is added to a clean sample from a dataset. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 22 REFERENCES
Towards Deep Neural Network Architectures Robust to Adversarial Examples
TLDR
Deep Contractive Network is proposed, a model with a new end-to-end training procedure that includes a smoothness penalty inspired by the contractive autoencoder (CAE) to increase the network robustness to adversarial examples, without a significant performance penalty. Expand
Intriguing properties of neural networks
TLDR
It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks. Expand
Dropout: a simple way to prevent neural networks from overfitting
TLDR
It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets. Expand
Deep neural networks are easily fooled: High confidence predictions for unrecognizable images
TLDR
This work takes convolutional neural networks trained to perform well on either the ImageNet or MNIST datasets and finds images with evolutionary algorithms or gradient ascent that DNNs label with high confidence as belonging to each dataset class, and produces fooling images, which are then used to raise questions about the generality of DNN computer vision. Expand
Maxout Networks
TLDR
A simple new model called maxout is defined designed to both facilitate optimization by dropout and improve the accuracy of dropout's fast approximate model averaging technique. Expand
Learning Multiple Layers of Features from Tiny Images
TLDR
It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network. Expand
Multi-Prediction Deep Boltzmann Machines
TLDR
The multi-prediction deep Boltzmann machine does not require greedy layerwise pretraining, and outperforms the standard DBM at classification, classification with missing inputs, and mean field prediction tasks. Expand
Visual Causal Feature Learning
TLDR
The Causal Coarsening Theorem is proved, which allows us to gain causal knowledge from observational data with minimal experimental effort, and an active learning scheme to learn a manipulator function that performs optimal manipulations on the image to automatically identify the visual cause of a target behavior. Expand
Going deeper with convolutions
We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual RecognitionExpand
Large Scale Distributed Deep Networks
TLDR
This paper considers the problem of training a deep network with billions of parameters using tens of thousands of CPU cores and develops two algorithms for large-scale distributed training, Downpour SGD and Sandblaster L-BFGS, which increase the scale and speed of deep network training. Expand
...
1
2
3
...