HAKD: Hardware Aware Knowledge Distillation
@article{Turner2018HAKDHA, title={HAKD: Hardware Aware Knowledge Distillation}, author={Jack Turner and Elliot J. Crowley and Valentin Radu and Jos{\'e} Cano and Amos J. Storkey and Michael F. P. O’Boyle}, journal={ArXiv}, year={2018}, volume={abs/1810.10460} }
Despite recent developments, deploying deep neural networks on resource constrained general purpose hardware remains a significant challenge. [] Key Method This allows the trade-off between accuracy and performance to be managed explicitly. We have applied this approach across three platforms and evaluated it on two networks, MobileNet and DenseNet, on CIFAR-10. We show that HAKD outperforms Deep Compression and Fisher pruning in terms of size, accuracy and performance.
3 Citations
Pruning neural networks: is it time to nip it in the bud?
- Computer ScienceArXiv
- 2018
This extended abstract examines residual networks obtained through Fisher-pruning and makes two interesting observations, namely, that when time-constrained, it is better to train a simple, smaller network from scratch than prune a large network, and that the architectures obtained through the pruning process prove valuable.
Revisiting Knowledge Distillation for Object Detection
- Computer ScienceArXiv
- 2021
Decoupling the teacher and ground-truth distillation in this framework provides interesting properties such as using unlabeled data to further improve the student’s performance, combining multiple teacher models of different architectures, even with different object categories, and reducing the need for labeled data.
Knowledge Distillation for Low-Power Object Detection: A Simple Technique and Its Extensions for Training Compact Models Using Unlabeled Data
- Computer Science2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
- 2021
Decoupling the teacher and ground-truth distillation in this framework provides interesting properties such as using unlabeled data to further improve the student’s performance, combining multiple teacher models of different architectures, even with different object categories, and reducing the need for labeled data.
References
SHOWING 1-10 OF 35 REFERENCES
Moonshine: Distilling with Cheap Convolutions
- Computer ScienceNeurIPS
- 2018
This work proposes structural model distillation for memory reduction using a strategy that produces a student architecture that is a simple transformation of the teacher architecture: no redesign is needed, and the same hyperparameters can be used.
Characterising Across-Stack Optimisations for Deep Convolutional Neural Networks
- Computer Science2018 IEEE International Symposium on Workload Characterization (IISWC)
- 2018
This paper unifies the two viewpoints in a Deep Learning Inference Stack and takes an across-stack approach by implementing and evaluating the most common neural network compression techniques and optimising their parallel execution with a range of programming approaches and hardware architectures.
Pruning neural networks: is it time to nip it in the bud?
- Computer ScienceArXiv
- 2018
This extended abstract examines residual networks obtained through Fisher-pruning and makes two interesting observations, namely, that when time-constrained, it is better to train a simple, smaller network from scratch than prune a large network, and that the architectures obtained through the pruning process prove valuable.
Scalpel: Customizing DNN pruning to the underlying hardware parallelism
- Computer Science2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)
- 2017
This work implemented weight pruning for several popular networks on a variety of hardware platforms and observed surprising results, including mean speedups of 3.54x, 2.61x, and 1.25x while reducing the model sizes by 88, 82%, and 53%.
Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding
- Computer ScienceICLR
- 2016
This work introduces "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.
NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications
- Computer ScienceECCV
- 2018
An algorithm that automatically adapts a pre-trained deep neural network to a mobile platform given a resource budget while maximizing the accuracy, and achieves better accuracy versus latency trade-offs on both mobile CPU and mobile GPU, compared with the state-of-the-art automated network simplification algorithms.
Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers
- Computer ScienceICLR
- 2018
This paper proposes a channel pruning technique for accelerating the computations of deep convolutional neural networks (CNNs) that focuses on direct simplification of the channel-to-channel computation graph of a CNN without the need of performing a computationally difficult and not-always-useful task.
FitNets: Hints for Thin Deep Nets
- Computer ScienceICLR
- 2015
This paper extends the idea of a student network that could imitate the soft output of a larger teacher network or ensemble of networks, using not only the outputs but also the intermediate representations learned by the teacher as hints to improve the training process and final performance of the student.
DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures
- Computer ScienceECCV
- 2018
DPP-Net is proposed: Device-aware Progressive Search for Pareto-optimal Neural Architectures, optimizing for both device-related and device-agnostic objectives, which achieves better performances: higher accuracy & shorter inference time on various devices.
Pruning Convolutional Neural Networks for Resource Efficient Inference
- Computer ScienceICLR
- 2017
It is shown that pruning can lead to more than 10x theoretical (5x practical) reduction in adapted 3D-convolutional filters with a small drop in accuracy in a recurrent gesture classifier.