Corpus ID: 232170209

Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones

  title={Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones},
  author={Cheng Cui and Ruoyu Guo and Yuning Du and Dongliang He and Fu Li and Zewu Wu and Qiwen Liu and Shilei Wen and Jizhou Huang and Xiaoguang Hu and Dianhai Yu and Errui Ding and Yanjun Ma},
Recently, research efforts have been concentrated on revealing how pre-trained model makes a difference in neural network performance. Self-supervision and semisupervised learning technologies have been extensively explored by the community and are proven to be of great potential in obtaining a powerful pre-trained model. However, these models require huge training costs (i.e., hundreds of millions of images or training iterations). In this paper, we propose to improve existing baseline… Expand
2 Citations

Figures and Tables from this paper

PP-LCNet: A Lightweight CPU Convolutional Neural Network
  • Cheng Cui, Tingquan Gao, +10 authors Yanjun Ma
  • Computer Science
  • ArXiv
  • 2021
A lightweight CPU network based on the MKLDNN acceleration strategy, named PP-LCNet, which improves the performance of lightweight models on multiple tasks and can greatly surpass the previous network structure with the same inference time for classification. Expand
PP-YOLOv2: A Practical Object Detector
A collection of existing refinements are comprehensively evaluated to improve the performance of PP-YOLO while almost keep the infer time unchanged and a significant margin of performance has been made. Expand


FitNets: Hints for Thin Deep Nets
This paper extends the idea of a student network that could imitate the soft output of a larger teacher network or ensemble of networks, using not only the outputs but also the intermediate representations learned by the teacher as hints to improve the training process and final performance of the student. Expand
Boosting Self-Supervised Learning via Knowledge Transfer
A novel framework for self-supervised learning is presented that overcomes limitations in designing and comparing different tasks, models, and data domains and achieves state-of-the-art performance on the common benchmarks in PASCAL VOC 2007, ILSVRC12 and Places by a significant margin. Expand
Unsupervised Data Augmentation for Consistency Training
A new perspective on how to effectively noise unlabeled examples is presented and it is argued that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning. Expand
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results
The recently proposed Temporal Ensembling has achieved state-of-the-art results in several semi-supervised learning benchmarks, but it becomes unwieldy when learning large datasets, so Mean Teacher, a method that averages model weights instead of label predictions, is proposed. Expand
Knowledge Distillation: A Survey
A comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, distillation algorithms and applications is provided. Expand
Role-Wise Data Augmentation for Knowledge Distillation
It is found empirically that specially tailored data points enable the teacher's knowledge to be demonstrated more effectively to the student and that role-wise data augmentation improves the effectiveness of KD over strong prior approaches. Expand
Revisit Knowledge Distillation: a Teacher-free Framework
It is argued that the success of KD is not fully due to the similarity information between categories, but also to the regularization of soft targets, which is equally or even more important. Expand
A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning
A novel technique for knowledge transfer, where knowledge from a pretrained deep neural network (DNN) is distilled and transferred to another DNN, which shows the student DNN that learns the distilled knowledge is optimized much faster than the original model and outperforms the original DNN. Expand
Distilling the Knowledge in a Neural Network
This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Expand
Billion-scale semi-supervised learning for image classification
This paper proposes a pipeline, based on a teacher/student paradigm, that leverages a large collection of unlabelled images to improve the performance for a given target architecture, like ResNet-50 or ResNext. Expand