Revisiting Adversarial Robustness Distillation: Robust Soft Labels Make Student Better

  title={Revisiting Adversarial Robustness Distillation: Robust Soft Labels Make Student Better},
  author={Bojia Zi and Shihao Zhao and Xingjun Ma and Yu-Gang Jiang},
  journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)},
Adversarial training is one effective approach for training robust deep neural networks against adversarial attacks. While being able to bring reliable robustness, adversarial training (AT) methods in general favor high capacity models, i.e., the larger the model the better the robustness. This tends to limit their effectiveness on small models, which are more preferable in scenarios where storage or computing resources are very limited (e.g., mobile devices). In this paper, we leverage the… 

Enhanced Accuracy and Robustness via Multi-teacher Adversarial Distillation

The Multi-Teacher Adversarial Robustness Distillation (MTARD) is introduced, which uses multiple large teacher models, including an adversarial teacher and a clean teacher to guide a small student model in the adversarial training by knowledge distillation to improve the robust and clean accuracy of small models.

ARDIR: Improving Robustness using Knowledge Distillation of Internal Representation

This work proposes Adversarial Robust Distillation with Internal Representation (ARDIR), which uses the internal representation of the teacher model as a label for adversarial training and can learn more robust student models.

Improving Corruption and Adversarial Robustness by Enhancing Weak Subnets

It is shown that the proposed novel robust training method, EWS, greatly improves the robustness against corrupted images as well as the accuracy on clean data, and is complementary to many state-of-the-art data augmentation approaches.

A Survey on Efficient Methods for Adversarial Robustness

This paper presents a comprehensive survey on efficient adversarial robustness methods with an aim to present a holistic outlook to make future exploration more systematic and exhaustive.

Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks

This paper conducts a comprehensive investigation on the impact of network width and depth on the robustness of adversarially trained DNNs and provides a theoretical analysis explaning why such network configuration can help robustness.

DAFT: Distilling Adversarially Fine-tuned Models for Better OOD Generalization

This work proposes a new method – DAFT – based on the intuition that adversarially robust combination of a large number of rich features should provide OOD robustness, and demonstrates that it achieves improvements over the current state-of-the-art OOD generalization methods.

Accelerating Certified Robustness Training via Knowledge Transfer

The experiments on CIFAR-10 show that CRT speeds up certified robustness training by 8 × on average across three different architecture generations while achieving comparable robustness to state-of-the-art methods.

Robust Few-shot Learning Without Using any Adversarial Samples

Inspired by the cognitive decision-making process in humans, a simple but effective alternative that does not require any adversarial samples is proposed that yields a massive improvement in adversarial accuracy on the PGD and state-of-the-art Auto Attack datasets.

Maximum Likelihood Distillation for Robust Modulation Classification

This work builds on knowledge distillation ideas and adversarial training in order to build more robust AMC systems, and proposes to use the Maximum Likelihood function, which could solve the AMC problem in offline settings, to generate better training labels.

Improving Robustness by Enhancing Weak Subnets

Results indicate that improving the performance of subnets through EWS boosts clean and corrupted error across a range of state-of-the-art data augmentation schemes.



Adversarially Robust Distillation

It is found that a large amount of robustness may be inherited by the student even when distilled on only clean images, and Adversarially Robust Distillation (ARD) is introduced for distilling robustness onto student networks.

Feature Distillation With Guided Adversarial Contrastive Learning

This paper proposes a novel approach called Guided Adversarial Contrastive Distillation (GACD), to effectively transfer adversarial robustness from teacher to student with features, and demonstrates that students produced by this approach capture more structural knowledge from teachers and learn more robust features under adversarial attacks.

Adversarial Robustness through Local Linearization

A novel regularizer is introduced that encourages the loss to behave linearly in the vicinity of the training data, thereby penalizing gradient obfuscation while encouraging robustness and shows via extensive experiments on CIFAR-10 and ImageNet, that models trained with this regularizer avoid gradient obfuscations and can be trained significantly faster than adversarial training.

Smooth Adversarial Training

The purpose of smooth activation functions in SAT is to allow it to find harder adversarial examples and compute better gradient updates during adversarial training, which improves adversarial robustness for "free", i.e., no drop in accuracy and no increase in computational cost.

Improving Adversarial Robustness Requires Revisiting Misclassified Examples

This paper proposes a new defense algorithm called MART, which explicitly differentiates the misclassified and correctly classified examples during the training, and shows that MART and its variant could significantly improve the state-of-the-art adversarial robustness.

Improving the Generalization of Adversarial Training with Domain Adaptation

Empirical evaluations demonstrate that ATDA can greatly improve the generalization of adversarial training and the smoothness of the learned models, and outperforms state-of-the-art methods on standard benchmark datasets.

Adversarial Weight Perturbation Helps Robust Generalization

This paper proposes a simple yet effective Adversarial Weight Perturbation (AWP) to explicitly regularize the flatness of weight loss landscape, forming a double-perturbation mechanism in the adversarial training framework that adversarially perturbs both inputs and weights.

On the Convergence and Robustness of Adversarial Training

This paper proposes a dynamic training strategy to gradually increase the convergence quality of the generated adversarial examples, which significantly improves the robustness of adversarial training.

Feature Denoising for Improving Adversarial Robustness

It is suggested that adversarial perturbations on images lead to noise in the features constructed by these networks, and new network architectures are developed that increase adversarial robustness by performing feature denoising.

Improving Adversarial Robustness via Channel-wise Activation Suppressing

It is shown that CAS can train a model that inherently suppresses adversarial activation, and can be easily applied to existing defense methods to further improve their robustness.