Effectiveness of Arbitrary Transfer Sets for Data-free Knowledge Distillation

@article{Nayak2021EffectivenessOA,
  title={Effectiveness of Arbitrary Transfer Sets for Data-free Knowledge Distillation},
  author={Gaurav Kumar Nayak and Konda Reddy Mopuri and Anirban Chakraborty},
  journal={2021 IEEE Winter Conference on Applications of Computer Vision (WACV)},
  year={2021},
  pages={1429-1437}
}
Knowledge Distillation is an effective method to transfer the learning across deep neural networks. Typically, the dataset originally used for training the Teacher model is chosen as the "Transfer Set" to conduct the knowledge transfer to the Student. However, this original training data may not always be freely available due to privacy or sensitivity concerns. In such scenarios, existing approaches either iteratively compose a synthetic set representative of the original training dataset, one… 
Robust and Resource-Efficient Data-Free Knowledge Distillation by Generative Pseudo Replay
TLDR
A Variational Autoencoder with a training objective that is customized to learn the synthetic data representations optimally and optimizes the expected value of the distilled model accuracy while eliminating the large memory overhead incurred by the sample-storing methods.
Black-box Few-shot Knowledge Distillation
TLDR
The main idea is to expand the training set by generating a diverse set of out-of-distribution synthetic images using MixUp and a conditional variational auto-encoder, which significantly outperforms recent SOTA few/zero-shot KD methods on image classification tasks.
Data-Free Knowledge Transfer: A Survey
TLDR
A comprehensive survey on data-free knowledge transfer from the perspectives of knowledge distillation and unsupervised domain adaptation is provided to help readers have a better understanding of the current research status and ideas.
Knowledge Distillation with Distribution Mismatch
TLDR
This work proposes a novel method for KD process that is the first method addressing the challenge of distribution mismatch when performing KD process, and achieves better accuracy than the standard KD loss function.
Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data
TLDR
This paper attempts to tackle an ambitious task, termed as out-of-domain knowledge distillation (OOD-KD), which allows us to conduct KD using only OOD data that can be readily obtained at a very low cost, and introduces a handy yet surprisingly efficacious approach, dubbed as MosaicKD.
Preventing Catastrophic Forgetting and Distribution Mismatch in Knowledge Distillation via Synthetic Data
TLDR
A data-free KD framework that maintains a dynamic collection of generated samples over time that can improve the accuracy of the student models obtained via KD when compared with state-of-the-art approaches on the SVHN, Fashion MNIST and CIFAR100 datasets.
Knowledge Distillation: A Survey
TLDR
A comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, distillation algorithms and applications is provided.
CDFKD-MFS: Collaborative Data-free Knowledge Distillation via Multi-level Feature Sharing
TLDR
This work proposes a framework termed collaborative data-free knowledge distillation via multi-level feature sharing (CDFKD-MFS), which consists of a multi-header student module, an asymmetric adversarial data- free KD module, and an attention-based aggregation module.

References

SHOWING 1-10 OF 24 REFERENCES
Zero-Shot Knowledge Distillation in Deep Networks
TLDR
This paper synthesizes the Data Impressions from the complex Teacher model and utilize these as surrogates for the original training data samples to transfer its learning to Student via knowledge distillation, and shows that this framework results in competitive generalization performance as achieved by distillation using the actualTraining data samples on multiple benchmark datasets.
Zero-shot Knowledge Transfer via Adversarial Belief Matching
TLDR
A novel method which trains a student to match the predictions of its teacher without using any data or metadata is proposed, and a metric is proposed to quantify the degree of belief matching between teacher and student in the vicinity of decision boundaries.
Data-Free Knowledge Distillation for Deep Neural Networks
TLDR
This work presents a method for data-free knowledge distillation, which is able to compress deep neural networks trained on large-scale datasets to a fraction of their size leveraging only some extra metadata to be provided with a pretrained model release.
DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a Trained Classifier
TLDR
This work uses the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples from a trained classifier, using a novel Data-enriching GAN (DeGAN) framework to bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a given trained network.
Data-Free Learning of Student Networks
TLDR
A novel framework for training efficient deep neural networks by exploiting generative adversarial networks (GANs) is proposed, where the pre-trained teacher networks are regarded as a fixed discriminator and the generator is utilized for derivating training samples which can obtain the maximum response on the discriminator.
Distilling the Knowledge in a Neural Network
TLDR
This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.
Rethinking the Inception Architecture for Computer Vision
TLDR
This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
TLDR
Fashion-MNIST is intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits.
Model compression
TLDR
This work presents a method for "compressing" large, complex ensembles into smaller, faster models, usually without significant loss in performance.
...
...