• Corpus ID: 167217931

Trust but Verify: An Information-Theoretic Explanation for the Adversarial Fragility of Machine Learning Systems, and a General Defense against Adversarial Attacks

  title={Trust but Verify: An Information-Theoretic Explanation for the Adversarial Fragility of Machine Learning Systems, and a General Defense against Adversarial Attacks},
  author={Jirong Yi and Hui Xie and Leixin Zhou and Xiaodong Wu and Weiyu Xu and Raghuraman Mudumbai},
Deep-learning based classification algorithms have been shown to be susceptible to adversarial attacks: minor changes to the input of classifiers can dramatically change their outputs, while being imperceptible to humans. [] Key Method We further show theoretical guarantees for the performance of this detection method. We present experimental results with (a) a voice recognition system, and (b) a digit recognition system using the MNIST database, to demonstrate the effectiveness of the proposed defense…

Derivation of Information-Theoretically Optimal Adversarial Attacks with Applications to Robust Machine Learning

It is shown that it is much harder to achieve adversarial attacks for minimizing mutual information when multiple redundant copies of the input signal are available, providing additional support to the recently proposed ``feature compression" hypothesis as an explanation for the adversarial vulnerability of deep learning classifiers.

Do Deep Minds Think Alike? Selective Adversarial Attacks for Fine-Grained Manipulation of Multiple Deep Neural Networks

Preliminary findings from these experiments show that it is in fact very easy to selectively manipulate multiple MNIST classifiers simultaneously, even when the classifiers are identical in their architectures, training algorithms and training datasets except for random initialization during training.

An Adaptive Black-box Defense against Trojan Attacks (TrojDef)

A more practical black-box defense against Trojan backdoor, dubbed T ROJ D EF, which outperforms the-state- of-the-art defenses and is highly stable under different settings, even when the classifier architecture, the training process, or the hyper-parameters change.

Mutual Information Learned Classifiers: an Information-theoretic Viewpoint of Training Deep Learning Classification Systems

This paper shows that the existing cross entropy loss minimization problem essentially learns the label conditional entropy of the underlying data distribution of the dataset, and proposes a mutual information learning framework where deep neural network classifiers are trained via learning the mutual information between the label and the in- put.



Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks

The study shows that defensive distillation can reduce effectiveness of sample creation from 95% to less than 0.5% on a studied DNN, and analytically investigates the generalizability and robustness properties granted by the use of defensive Distillation when training DNNs.

Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models

The proposed Defense-GAN, a new framework leveraging the expressive capability of generative models to defend deep neural networks against adversarial perturbations, is empirically shown to be consistently effective against different attack methods and improves on existing defense strategies.

Towards Evaluating the Robustness of Neural Networks

It is demonstrated that defensive distillation does not significantly increase the robustness of neural networks, and three new attack algorithms are introduced that are successful on both distilled and undistilled neural networks with 100% probability are introduced.

Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey

This paper presents the first comprehensive survey on adversarial attacks on deep learning in computer vision, reviewing the works that design adversarial attack, analyze the existence of such attacks and propose defenses against them.

Rademacher Complexity for Adversarially Robust Generalization

For binary linear classifiers, it is shown that the adversarial Rademacher complexity is never smaller than its natural counterpart, and it has an unavoidable dimension dependence, unless the weight vector has bounded $\ell_1$ norm.

Discrete Adversarial Attacks and Submodular Optimization with Applications to Text Classification

It is proved that this set function is submodular for some popular neural network text classifiers under simplifying assumption, and guarantees a $1-1/e$ approximation factor for attacks that use the greedy algorithm.

GanDef: A GAN based Adversarial Training Defense for Neural Network Classifier

This paper designs a Generative Adversarial Net (GAN) based adversarial training defense, dubbed GanDef, which utilizes a competition game to regulate the feature selection during the training and analytically shows that GanDef can train a classifier so it can defend against adversarial examples.

Adversarial examples in the physical world

It is found that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera, which shows that even in physical world scenarios, machine learning systems are vulnerable to adversarialExamples.

Guessing Smart: Biased Sampling for Efficient Black-Box Adversarial Attacks

It is shown that a specific class of attacks, Boundary Attacks, can be reinterpreted as a biased sampling framework that gains efficiency from domain knowledge, and three such biases, image frequency, regional masks and surrogate gradients, are identified and evaluated against an ImageNet classifier.

Seeing isn't Believing: Practical Adversarial Attack Against Object Detectors

The feature-interference reinforcement (FIR) method and the enhanced realistic constraints generation (ERG) to enhance robustness and the nested-AE, which combines two AEs together to attack object detectors in both long and short distance are proposed.