FitNets: Hints for Thin Deep Nets

Abstract

While depth tends to improve network performances, it also makes gradient-based training more difficult since deeper networks tend to be more non-linear. The recently proposed knowledge distillation approach is aimed at obtaining small and fast-to-execute models, and it has shown that a student network could imitate the soft output of a larger teacher… (More)

Topics

7 Figures and Tables

Statistics

01002002015201620172018
Citations per Year

352 Citations

Semantic Scholar estimates that this publication has 352 citations based on the available data.

See our FAQ for additional information.

Blog articles referencing this paper

Slides referencing similar topics