Corpus ID: 233296935

Distilling Knowledge via Knowledge Review

  title={Distilling Knowledge via Knowledge Review},
  author={Pengguang Chen and Shu Shun Liu and Hengshuang Zhao and Jiaya Jia},
  • Pengguang Chen, S. Liu, +1 author Jiaya Jia
  • Published in CVPR 2021
  • Computer Science
Knowledge distillation transfers knowledge from the teacher network to the student one, with the goal of greatly improving the performance of the student network. Previous methods mostly focus on proposing feature transformation and loss functions between the same level’s features to improve the effectiveness. We differently study the factor of connection path cross levels between teacher and student networks, and reveal its great importance. For the first time in knowledge distillation, cross… Expand

Figures and Tables from this paper

Learnable Boundary Guided Adversarial Training
The target is to reduce natural accuracy degradation by constrain logits from the robust model $\mathcal{M}^{robust}$ that takes adversarial examples as input and make it similar to those from a clean model fed with corresponding natural data. Expand
Response-based Distillation for Incremental Object Detection
  • Tao Feng, Mang Wang
  • Computer Science
  • 2021
Traditional object detection are ill-equipped for incremental learning. However, fine-tuning directly on a well-trained detection model with only new data will leads to catastrophic forgetting.Expand


Relational Knowledge Distillation
RKD allows students to outperform their teachers' performance, achieving the state of the arts on standard benchmark datasets and proposes distance-wise and angle-wise distillation losses that penalize structural differences in relations. Expand
Variational Information Distillation for Knowledge Transfer
An information-theoretic framework for knowledge transfer is proposed which formulates knowledge transfer as maximizing the mutual information between the teacher and the student networks and which consistently outperforms existing methods. Expand
Contrastive Representation Distillation
The resulting new objective outperforms knowledge distillation and other cutting-edge distillers on a variety of knowledge transfer tasks, including single model compression, ensemble distillation, and cross-modal transfer. Expand
Probabilistic Knowledge Transfer for Deep Representation Learning
A novel knowledge transfer technique, that is capable of training a student model that maintains the same amount of mutual information between the learned representation and a set of (possible unknown) labels as the teacher model, is proposed. Expand
Similarity-Preserving Knowledge Distillation
This paper proposes a new form of knowledge distillation loss that is inspired by the observation that semantically similar inputs tend to elicit similar activation patterns in a trained network. Expand
A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning
A novel technique for knowledge transfer, where knowledge from a pretrained deep neural network (DNN) is distilled and transferred to another DNN, which shows the student DNN that learns the distilled knowledge is optimized much faster than the original model and outperforms the original DNN. Expand
FitNets: Hints for Thin Deep Nets
This paper extends the idea of a student network that could imitate the soft output of a larger teacher network or ensemble of networks, using not only the outputs but also the intermediate representations learned by the teacher as hints to improve the training process and final performance of the student. Expand
Paraphrasing Complex Network: Network Compression via Factor Transfer
A novel knowledge transfer method which uses convolutional operations to paraphrase teacher's knowledge and to translate it for the student and observes that the student network trained with the proposed factor transfer method outperforms the ones trained with conventional knowledge transfer methods. Expand
A Comprehensive Overhaul of Feature Distillation
A novel feature distillation method in which the distillation loss is designed to make a synergy among various aspects: teacher transform, student transform, distillation feature position and distance function, which achieves a significant performance improvement in all tasks. Expand
Distilling Object Detectors With Fine-Grained Feature Imitation
A fine-grained feature imitation method exploiting the cross-location discrepancy of feature response on the near object anchor locations reveals important information of how teacher model tends to generalize. Expand