• Corpus ID: 53407760

On Compressing U-net Using Knowledge Distillation

@article{Mangalam2018OnCU,
  title={On Compressing U-net Using Knowledge Distillation},
  author={Karttikeya Mangalam and Mathieu Salzamann},
  journal={ArXiv},
  year={2018},
  volume={abs/1812.00249}
}
We study the use of knowledge distillation to compress the U-net architecture. We show that, while standard distillation is not sufficient to reliably train a compressed U-net, introducing other regularization methods, such as batch normalization and class re-weighting, in knowledge distillation significantly improves the training process. This allows us to compress a U-net by over 1000x, i.e., to 0.1% of its original number of parameters, at a negligible decrease in performance. 

Figures and Tables from this paper

Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks

  • Lin WangKuk-Jin Yoon
  • Computer Science
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2022
This paper provides a comprehensive survey on the recent progress of KD methods together with S-T frameworks typically used for vision tasks and systematically analyzes the research status of KD in vision applications.

DNetUnet: a semi-supervised CNN of medical image segmentation for super-computing AI service

A new convolutional neural network architecture named DNetUnet is proposed, which combines U-Nets with different down-sampling levels and a new dense block as feature extractor and is a semi-supervised learning method, which can be used not only to obtain expert knowledge from the labelled corpus, but also to enhance the performance of learning algorithm generalization ability from unlabelled data.

Towards End-to-End Deep Learning-based Writer Identification

A fully end-to-end deep learning-based model, which consists of a U-Net for binarization, a ResNet-50 for feature extraction, and an optimized learnable residual encoding layer to obtain global descriptors is proposed.

Obesity May Be Bad: Compressed Convolutional Networks for Biomedical Image Segmentation

The Optimum Mimic Backbone (OMB) is introduced, which can force compressed CNN mimics what the original CNN behaves in optimal situations and gets higher IoU scores than other state-of-the-art compression techniques in experiments on four popular, different biomedical image segmentation datasets.

Low-Memory CNNs Enabling Real-Time Ultrasound Segmentation Towards Mobile Deployment

This article demonstrates the power of ‘thin’ CNNs (with very few feature channels) for fast medical image segmentation, and proposes three approaches to training efficient CNNs that can operate in real-time on a CPU, with a low memory footprint, for minimal compromise in accuracy.

Deep Learning based Intraretinal Layer Segmentation using Cascaded Compressed U-Net

This work proposes a cascaded two-stage network for intraretinal layer segmentation, with both networks being compressed versions of U-Net (CCU-INSEG), and introduces Laplacian-based outlier detection with layer surface hole filling by adaptive non-linear interpolation at the post-processing stage.

Segmentation of roots in soil with U-Net

The feasibility of a U-Net based CNN system for segmenting images of roots in soil and for replacing the manual line-intersect method is demonstrated, demonstrating the feasibility of deep learning in practice for small research groups needing to create their own custom labelled dataset from scratch.

Fast and Accurate Single-Image Depth Estimation on Mobile Devices, Mobile AI 2021 Challenge: Report

This paper introduces the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based depth estimation solutions that can demonstrate a nearly realtime performance on smartphones and IoT platforms.

Intraretinal Layer Segmentation Using Cascaded Compressed U-Nets

The proposed and validate a cascaded two-stage network for intraretinal layer segmentation with both networks being compressed versions of U-Net (CCU-INSEG) suggest that the proposed method can robustly segment macular scans from eyes with even severe neuroretinal changes.

References

SHOWING 1-10 OF 25 REFERENCES

Compression-aware Training of Deep Networks

It is shown that accounting for compression during training allows us to learn much more compact, yet at least as effective, models than state-of-the-art compression techniques.

Learning Efficient Object Detection Models with Knowledge Distillation

This work proposes a new framework to learn compact and fast object detection networks with improved accuracy using knowledge distillation and hint learning and shows consistent improvement in accuracy-speed trade-offs for modern multi-class detection models.

Model compression

This work presents a method for "compressing" large, complex ensembles into smaller, faster models, usually without significant loss in performance.

Distilling the Knowledge in a Neural Network

This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.

Learning both Weights and Connections for Efficient Neural Network

A method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections, and prunes redundant connections using a three-step method.

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

Stacked Hourglass Networks for Human Pose Estimation

This work introduces a novel convolutional network architecture for the task of human pose estimation that is described as a “stacked hourglass” network based on the successive steps of pooling and upsampling that are done to produce a final set of predictions.

Model Distillation with Knowledge Transfer in Face Classification, Alignment and Verification

This paper takes face recognition as a breaking point and proposes model distillation with knowledge transfer from face classification to alignment and verification, and uses a common initialization trick to improve the distillation performance of classification.

Face Model Compression by Distilling Knowledge from Neurons

This work addresses model compression for face recognition, where the learned knowledge of a large teachernetwork or its ensemble is utilized as supervision to train a compact student network by leveraging the essential characteristics of thelearned face representation.

Very Deep Convolutional Networks for Large-Scale Image Recognition

This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.