• Corpus ID: 53085852

HAKD: Hardware Aware Knowledge Distillation

@article{Turner2018HAKDHA,
  title={HAKD: Hardware Aware Knowledge Distillation},
  author={Jack Turner and Elliot J. Crowley and Valentin Radu and Jos{\'e} Cano and Amos J. Storkey and Michael F. P. O’Boyle},
  journal={ArXiv},
  year={2018},
  volume={abs/1810.10460}
}
Despite recent developments, deploying deep neural networks on resource constrained general purpose hardware remains a significant challenge. [] Key Method This allows the trade-off between accuracy and performance to be managed explicitly. We have applied this approach across three platforms and evaluated it on two networks, MobileNet and DenseNet, on CIFAR-10. We show that HAKD outperforms Deep Compression and Fisher pruning in terms of size, accuracy and performance.

Figures from this paper

Pruning neural networks: is it time to nip it in the bud?

This extended abstract examines residual networks obtained through Fisher-pruning and makes two interesting observations, namely, that when time-constrained, it is better to train a simple, smaller network from scratch than prune a large network, and that the architectures obtained through the pruning process prove valuable.

Revisiting Knowledge Distillation for Object Detection

Decoupling the teacher and ground-truth distillation in this framework provides interesting properties such as using unlabeled data to further improve the student’s performance, combining multiple teacher models of different architectures, even with different object categories, and reducing the need for labeled data.

Knowledge Distillation for Low-Power Object Detection: A Simple Technique and Its Extensions for Training Compact Models Using Unlabeled Data

Decoupling the teacher and ground-truth distillation in this framework provides interesting properties such as using unlabeled data to further improve the student’s performance, combining multiple teacher models of different architectures, even with different object categories, and reducing the need for labeled data.

References

SHOWING 1-10 OF 35 REFERENCES

Moonshine: Distilling with Cheap Convolutions

This work proposes structural model distillation for memory reduction using a strategy that produces a student architecture that is a simple transformation of the teacher architecture: no redesign is needed, and the same hyperparameters can be used.

Characterising Across-Stack Optimisations for Deep Convolutional Neural Networks

This paper unifies the two viewpoints in a Deep Learning Inference Stack and takes an across-stack approach by implementing and evaluating the most common neural network compression techniques and optimising their parallel execution with a range of programming approaches and hardware architectures.

Pruning neural networks: is it time to nip it in the bud?

This extended abstract examines residual networks obtained through Fisher-pruning and makes two interesting observations, namely, that when time-constrained, it is better to train a simple, smaller network from scratch than prune a large network, and that the architectures obtained through the pruning process prove valuable.

Scalpel: Customizing DNN pruning to the underlying hardware parallelism

This work implemented weight pruning for several popular networks on a variety of hardware platforms and observed surprising results, including mean speedups of 3.54x, 2.61x, and 1.25x while reducing the model sizes by 88, 82%, and 53%.

Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding

This work introduces "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.

NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications

An algorithm that automatically adapts a pre-trained deep neural network to a mobile platform given a resource budget while maximizing the accuracy, and achieves better accuracy versus latency trade-offs on both mobile CPU and mobile GPU, compared with the state-of-the-art automated network simplification algorithms.

Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers

This paper proposes a channel pruning technique for accelerating the computations of deep convolutional neural networks (CNNs) that focuses on direct simplification of the channel-to-channel computation graph of a CNN without the need of performing a computationally difficult and not-always-useful task.

FitNets: Hints for Thin Deep Nets

This paper extends the idea of a student network that could imitate the soft output of a larger teacher network or ensemble of networks, using not only the outputs but also the intermediate representations learned by the teacher as hints to improve the training process and final performance of the student.

DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures

DPP-Net is proposed: Device-aware Progressive Search for Pareto-optimal Neural Architectures, optimizing for both device-related and device-agnostic objectives, which achieves better performances: higher accuracy & shorter inference time on various devices.

Pruning Convolutional Neural Networks for Resource Efficient Inference

It is shown that pruning can lead to more than 10x theoretical (5x practical) reduction in adapted 3D-convolutional filters with a small drop in accuracy in a recurrent gesture classifier.