Adversarially Robust Distillation

@inproceedings{Goldblum2020AdversariallyRD,
  title={Adversarially Robust Distillation},
  author={Micah Goldblum and Liam Fowl and S. Feizi and T. Goldstein},
  booktitle={AAAI},
  year={2020}
}
Knowledge distillation is effective for producing small, high-performance neural networks for classification, but these small networks are vulnerable to adversarial attacks. This paper studies how adversarial robustness transfers from teacher to student during knowledge distillation. We find that a large amount of robustness may be inherited by the student even when distilled on only clean images. Second, we introduce Adversarially Robust Distillation (ARD) for distilling robustness onto… Expand
Revisiting Adversarial Robustness Distillation: Robust Soft Labels Make Student Better
Adversarial training is one effective approach for training robust deep neural networks against adversarial attacks. While being able to bring reliable robustness, adversarial training (AT) methodsExpand
Feature Distillation With Guided Adversarial Contrastive Learning
TLDR
This paper proposes a novel approach called Guided Adversarial Contrastive Distillation (GACD), to effectively transfer adversarial robustness from teacher to student with features, and demonstrates that students produced by this approach capture more structural knowledge from teachers and learn more robust features under adversarial attacks. Expand
Renofeation: A Simple Transfer Learning Method for Improved Adversarial Robustness
Fine-tuning through knowledge transfer from a pre-trained model on a large-scale dataset is a widely spread approach to effectively build models on small-scale datasets. In this work, we show that aExpand
Rethinking Uncertainty in Deep Learning: Whether and How it Improves Robustness
TLDR
It is shown that uncertainty promotion regularizers complement AT in a principled manner, consistently improving performance on both clean examples and under various attacks, especially attacks with large perturbations. Expand
Anti-Adversarial Input with Self-Ensemble Model Transformations
Deep-learning models that perform image classification tasks are vulnerable to adversarial inputs that lower model accuracy and recall. Many mitigation techniques sacrifice original model accuracy toExpand
Prepare for the Worst: Generalizing across Domain Shifts with Adversarial Batch Normalization
TLDR
This work adapts adversarial training by adversarially perturbing feature statistics, rather than image pixels, to produce models that are robust to domain shift, and significantly improves the performance of ResNet-50 on ImageNet-C, Stylized-ImageNet, and Image net-Instagram over standard training practices. Expand
Adversarial Robustness for Unsupervised Domain Adaptation
  • Muhammad Awais, Fengwei Zhou, +4 authors Zhenguo Li
  • Computer Science
  • 2021
Extensive Unsupervised Domain Adaptation (UDA) studies have shown great success in practice by learning transferable representations across a labeled source domain and an unlabeled target domain withExpand
Reliable Adversarial Distillation with Unreliable Teachers
TLDR
This paper proposes reliable introspective adversarial distillation (IAD), a method for improving upon teachers in terms of adversarial robustness where students partially instead of fully trust their teachers. Expand
RoSearch: Search for Robust Student Architectures When Distilling Pre-trained Language Models
TLDR
RoSearch is proposed as a comprehensive framework to search the student models with better adversarial robustness when performing knowledge distillation with results showing that RoSearch can improve robustness of student models. Expand
Adversarial Examples Make Strong Poisons
TLDR
The method, adversarial poisoning, is substantially more effective than existing poisoning methods for secure dataset release, and a poisoned version of ImageNet is released to encourage research into the strength of this form of data obfuscation. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 38 REFERENCES
Towards Evaluating the Robustness of Neural Networks
TLDR
It is demonstrated that defensive distillation does not significantly increase the robustness of neural networks, and three new attack algorithms are introduced that are successful on both distilled and undistilled neural networks with 100% probability are introduced. Expand
Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks
TLDR
The study shows that defensive distillation can reduce effectiveness of sample creation from 95% to less than 0.5% on a studied DNN, and analytically investigates the generalizability and robustness properties granted by the use of defensive Distillation when training DNNs. Expand
Adversarial Training for Free!
TLDR
This work presents an algorithm that eliminates the overhead cost of generating adversarial examples by recycling the gradient information computed when updating model parameters, and achieves comparable robustness to PGD adversarial training on the CIFAR-10 and CIFar-100 datasets at negligible additional cost compared to natural training. Expand
Explaining and Harnessing Adversarial Examples
TLDR
It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets. Expand
Feature Denoising for Improving Adversarial Robustness
TLDR
It is suggested that adversarial perturbations on images lead to noise in the features constructed by these networks, and new network architectures are developed that increase adversarial robustness by performing feature denoising. Expand
You Only Propagate Once: Painless Adversarial Training Using Maximal Principle
TLDR
This work fully exploits structure of deep neural networks and proposes a novel strategy to decouple the adversary update with the gradient back propagation, which avoids forward and backward propagating the data too many times in one iteration, and restricts core descent directions computation to the first layer of the network, thus speeding up every iteration significantly. Expand
Towards Compact and Robust Deep Neural Networks
TLDR
This work proposes a new pruning method that can create compact networks while preserving both benign accuracy and robustness of a network and ensures that the training objectives of the pre-training and fine-tuning steps match the training objective of the desired robust model. Expand
Robustness of Compressed Convolutional Neural Networks
TLDR
This work studies how robust CNN models are with respect to state-of-the-art compression techniques such as quantization and reveals that compressed models are naturally more robust than compact models. Expand
Boosting Adversarial Attacks with Momentum
TLDR
A broad class of momentum-based iterative algorithms to boost adversarial attacks by integrating the momentum term into the iterative process for attacks, which can stabilize update directions and escape from poor local maxima during the iterations, resulting in more transferable adversarial examples. Expand
You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle
TLDR
It is shown that adversarial training can be cast as a discrete time differential game, and the proposed algorithm YOPO (You Only Propagate Once) can achieve comparable defense accuracy with approximately 1/5 ~ 1/4 GPU time of the projected gradient descent (PGD) algorithm. Expand
...
1
2
3
4
...