Corpus ID: 211004051

Search for Better Students to Learn Distilled Knowledge

@article{Gu2020SearchFB,
  title={Search for Better Students to Learn Distilled Knowledge},
  author={Jindong Gu and Volker Tresp},
  journal={ArXiv},
  year={2020},
  volume={abs/2001.11612}
}
  • Jindong Gu, Volker Tresp
  • Published in ArXiv 2020
  • Computer Science
  • Knowledge Distillation, as a model compression technique, has received great attention. The knowledge of a well-performed teacher is distilled to a student with a small architecture. The architecture of the small student is often chosen to be similar to their teacher's, with fewer layers or fewer channels, or both. However, even with the same number of FLOPs or parameters, the students with different architecture can achieve different generalization ability. The configuration of a student… CONTINUE READING

    Create an AI-powered research feed to stay up to date with new papers like this posted to ArXiv

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 33 REFERENCES

    Distilling the Knowledge in a Neural Network

    VIEW 12 EXCERPTS
    HIGHLY INFLUENTIAL

    Proximal Algorithms

    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL

    Neural Architecture Search with Reinforcement Learning

    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL

    Pruning Filters for Efficient ConvNets

    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL

    Model compression

    VIEW 8 EXCERPTS
    HIGHLY INFLUENTIAL

    DARTS: Differentiable Architecture Search

    VIEW 3 EXCERPTS

    Exploring Randomly Wired Neural Networks for Image Recognition

    VIEW 2 EXCERPTS