Corpus ID: 189928571

Uncovering Why Deep Neural Networks Lack Robustness: Representation Metrics that Link to Adversarial Attacks

@article{Vargas2019UncoveringWD,
  title={Uncovering Why Deep Neural Networks Lack Robustness: Representation Metrics that Link to Adversarial Attacks},
  author={Danilo Vasconcellos Vargas and Shashank Kotyan and Moe Matsuki},
  journal={ArXiv},
  year={2019},
  volume={abs/1906.06627}
}
Neural networks have been shown vulnerable to adversarial samples. Slightly perturbed input images are able to change the classification of accurate models, showing that the representation learned is not as good as previously thought. To aid the development of better neural networks, it would be important to evaluate to what extent are current neural networks' representations capturing the existing features. Here we propose a test that can evaluate neural networks using a new type of zero-shot… Expand

References

SHOWING 1-10 OF 41 REFERENCES
One Pixel Attack for Fooling Deep Neural Networks
TLDR
This paper proposes a novel method for generating one-pixel adversarial perturbations based on differential evolution (DE), which requires less adversarial information (a black-box attack) and can fool more types of networks due to the inherent features of DE. Expand
PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples
Adversarial perturbations of normal images are usually imperceptible to humans, but they can seriously confuse state-of-the-art machine learning models. What makes them so special in the eyes ofExpand
Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality
TLDR
The analysis of the LID characteristic for adversarial regions not only motivates new directions of effective adversarial defense, but also opens up more challenges for developing new attacks to better understand the vulnerabilities of DNNs. Expand
Towards Evaluating the Robustness of Neural Networks
TLDR
It is demonstrated that defensive distillation does not significantly increase the robustness of neural networks, and three new attack algorithms are introduced that are successful on both distilled and undistilled neural networks with 100% probability are introduced. Expand
Towards Deep Learning Models Resistant to Adversarial Attacks
TLDR
This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee. Expand
Universal Adversarial Perturbations
TLDR
The surprising existence of universal perturbations reveals important geometric correlations among the high-dimensional decision boundary of classifiers and outlines potential security breaches with the existence of single directions in the input space that adversaries can possibly exploit to break a classifier on most natural images. Expand
Explaining and Harnessing Adversarial Examples
TLDR
It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets. Expand
Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks
TLDR
The study shows that defensive distillation can reduce effectiveness of sample creation from 95% to less than 0.5% on a studied DNN, and analytically investigates the generalizability and robustness properties granted by the use of defensive Distillation when training DNNs. Expand
Ensemble Adversarial Training: Attacks and Defenses
TLDR
This work finds that adversarial training remains vulnerable to black-box attacks, where perturbations computed on undefended models are transferred to a powerful novel single-step attack that escapes the non-smooth vicinity of the input data via a small random step. Expand
Understanding the One Pixel Attack: Propagation Maps and Locality Analysis
TLDR
Propagation Maps reveal that even in extremely deep networks such as Resnet, modification in one pixel easily propagates until the last layer and this initial local perturbation is also shown to spread becoming a global one and reaching absolute difference values that are close to the maximum value of the original feature maps in a given layer. Expand
...
1
2
3
4
5
...