• Corpus ID: 167217931

# Trust but Verify: An Information-Theoretic Explanation for the Adversarial Fragility of Machine Learning Systems, and a General Defense against Adversarial Attacks

@article{Yi2019TrustBV,
title={Trust but Verify: An Information-Theoretic Explanation for the Adversarial Fragility of Machine Learning Systems, and a General Defense against Adversarial Attacks},
author={Jirong Yi and Hui Xie and Leixin Zhou and Xiaodong Wu and Weiyu Xu and Raghuraman Mudumbai},
journal={ArXiv},
year={2019},
volume={abs/1905.11381}
}
• Published 25 May 2019
• Computer Science
• ArXiv
Deep-learning based classification algorithms have been shown to be susceptible to adversarial attacks: minor changes to the input of classifiers can dramatically change their outputs, while being imperceptible to humans. [] Key Method We further show theoretical guarantees for the performance of this detection method. We present experimental results with (a) a voice recognition system, and (b) a digit recognition system using the MNIST database, to demonstrate the effectiveness of the proposed defense…
4 Citations

### Derivation of Information-Theoretically Optimal Adversarial Attacks with Applications to Robust Machine Learning

• Computer Science
ArXiv
• 2020
It is shown that it is much harder to achieve adversarial attacks for minimizing mutual information when multiple redundant copies of the input signal are available, providing additional support to the recently proposed feature compression" hypothesis as an explanation for the adversarial vulnerability of deep learning classifiers.

### Do Deep Minds Think Alike? Selective Adversarial Attacks for Fine-Grained Manipulation of Multiple Deep Neural Networks

• Computer Science
ArXiv
• 2020
Preliminary findings from these experiments show that it is in fact very easy to selectively manipulate multiple MNIST classifiers simultaneously, even when the classifiers are identical in their architectures, training algorithms and training datasets except for random initialization during training.

### An Adaptive Black-box Defense against Trojan Attacks (TrojDef)

• Computer Science
ArXiv
• 2022
A more practical black-box defense against Trojan backdoor, dubbed T ROJ D EF, which outperforms the-state- of-the-art defenses and is highly stable under different settings, even when the classiﬁer architecture, the training process, or the hyper-parameters change.

### Mutual Information Learned Classifiers: an Information-theoretic Viewpoint of Training Deep Learning Classification Systems

• Computer Science
ArXiv
• 2022
This paper shows that the existing cross entropy loss minimization problem essentially learns the label conditional entropy of the underlying data distribution of the dataset, and proposes a mutual information learning framework where deep neural network classiﬁers are trained via learning the mutual information between the label and the in- put.

## References

SHOWING 1-10 OF 64 REFERENCES

### Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks

• Computer Science
2016 IEEE Symposium on Security and Privacy (SP)
• 2016
The study shows that defensive distillation can reduce effectiveness of sample creation from 95% to less than 0.5% on a studied DNN, and analytically investigates the generalizability and robustness properties granted by the use of defensive Distillation when training DNNs.

### Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models

• Computer Science
ICLR
• 2018
The proposed Defense-GAN, a new framework leveraging the expressive capability of generative models to defend deep neural networks against adversarial perturbations, is empirically shown to be consistently effective against different attack methods and improves on existing defense strategies.

### Towards Evaluating the Robustness of Neural Networks

• Computer Science
2017 IEEE Symposium on Security and Privacy (SP)
• 2017
It is demonstrated that defensive distillation does not significantly increase the robustness of neural networks, and three new attack algorithms are introduced that are successful on both distilled and undistilled neural networks with 100% probability are introduced.

### Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey

• Computer Science
IEEE Access
• 2018
This paper presents the first comprehensive survey on adversarial attacks on deep learning in computer vision, reviewing the works that design adversarial attack, analyze the existence of such attacks and propose defenses against them.

• Computer Science
ICML
• 2019
For binary linear classifiers, it is shown that the adversarial Rademacher complexity is never smaller than its natural counterpart, and it has an unavoidable dimension dependence, unless the weight vector has bounded $\ell_1$ norm.

### Discrete Adversarial Attacks and Submodular Optimization with Applications to Text Classification

• Computer Science
MLSys
• 2019
It is proved that this set function is submodular for some popular neural network text classifiers under simplifying assumption, and guarantees a $1-1/e$ approximation factor for attacks that use the greedy algorithm.

### GanDef: A GAN based Adversarial Training Defense for Neural Network Classifier

• Computer Science
SEC
• 2019
This paper designs a Generative Adversarial Net (GAN) based adversarial training defense, dubbed GanDef, which utilizes a competition game to regulate the feature selection during the training and analytically shows that GanDef can train a classifier so it can defend against adversarial examples.

### Adversarial examples in the physical world

• Computer Science
ICLR
• 2017
It is found that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera, which shows that even in physical world scenarios, machine learning systems are vulnerable to adversarialExamples.

### Guessing Smart: Biased Sampling for Efficient Black-Box Adversarial Attacks

• Computer Science, Mathematics
2019 IEEE/CVF International Conference on Computer Vision (ICCV)
• 2019
It is shown that a specific class of attacks, Boundary Attacks, can be reinterpreted as a biased sampling framework that gains efficiency from domain knowledge, and three such biases, image frequency, regional masks and surrogate gradients, are identified and evaluated against an ImageNet classifier.

### Seeing isn't Believing: Practical Adversarial Attack Against Object Detectors

• Computer Science
• 2018
The feature-interference reinforcement (FIR) method and the enhanced realistic constraints generation (ERG) to enhance robustness and the nested-AE, which combines two AEs together to attack object detectors in both long and short distance are proposed.