• Computer Science, Mathematics
  • Published in ArXiv 2020

Subclass Distillation

@article{Mller2020SubclassD,
  title={Subclass Distillation},
  author={Rafael M{\"u}ller and Simon Kornblith and Geoffrey Hinton},
  journal={ArXiv},
  year={2020},
  volume={abs/2002.03936}
}
After a large “teacher” neural network has been trained on labeled data, the probabilities that the teacher assigns to incorrect classes reveal a lot of information about the way in which the teacher generalizes. By training a small “student” model to match these probabilities, it is possible to transfer most of the generalization ability of the teacher to the student, often producing a much better small model than directly training the student on the training data. The transfer works best when… CONTINUE READING

Figures, Tables, and Topics from this paper.

References

Publications referenced by this paper.
SHOWING 1-10 OF 36 REFERENCES

Deep Residual Learning for Image Recognition

VIEW 5 EXCERPTS
HIGHLY INFLUENTIAL

Distilling the Knowledge in a Neural Network

VIEW 5 EXCERPTS
HIGHLY INFLUENTIAL

Contrastive Representation Distillation

VIEW 2 EXCERPTS

Correlation Congruence for Knowledge Distillation

VIEW 1 EXCERPT

Relational Knowledge Distillation

VIEW 1 EXCERPT

Similarity-Preserving Knowledge Distillation

VIEW 1 EXCERPT