Corpus ID: 231740588

Rethinking Soft Labels for Knowledge Distillation: A Bias-Variance Tradeoff Perspective

@article{Zhou2021RethinkingSL,
  title={Rethinking Soft Labels for Knowledge Distillation: A Bias-Variance Tradeoff Perspective},
  author={Helong Zhou and Liangchen Song and Jiajie Chen and Ye Zhou and Guo-li Wang and J. Yuan and Q. Zhang},
  journal={ArXiv},
  year={2021},
  volume={abs/2102.00650}
}
Knowledge distillation is an effective approach to leverage a well-trained network or an ensemble of them, named as the teacher, to guide the training of a student network. The outputs from the teacher network are used as soft labels for supervising the training of a new network. Recent studies (Müller et al., 2019; Yuan et al., 2020) revealed an intriguing property of the soft labels that making labels soft serves as a good regularization to the student network. From the perspective of… Expand
1 Citations

Figures and Tables from this paper

Distilling Double Descent
  • PDF

References

SHOWING 1-10 OF 46 REFERENCES
Contrastive Representation Distillation
  • 108
  • Highly Influential
  • PDF
Towards Understanding Knowledge Distillation
  • 53
  • PDF
Similarity-Preserving Knowledge Distillation
  • 105
  • PDF
Relational Knowledge Distillation
  • 158
  • Highly Influential
  • PDF
Preparing Lessons: Improve Knowledge Distillation with Better Supervision
  • 8
  • PDF
Regularizing Neural Networks by Penalizing Confident Output Distributions
  • 506
  • PDF
When Does Label Smoothing Help?
  • 258
  • Highly Influential
  • PDF
FitNets: Hints for Thin Deep Nets
  • 1,353
  • Highly Influential
  • PDF
Patient Knowledge Distillation for BERT Model Compression
  • 150
  • PDF
Like What You Like: Knowledge Distill via Neuron Selectivity Transfer
  • 122
  • PDF
...
1
2
3
4
5
...