Learning Deep Representations with Probabilistic Knowledge Transfer

@inproceedings{Passalis2018LearningDR,
  title={Learning Deep Representations with Probabilistic Knowledge Transfer},
  author={Nikolaos Passalis and Anastasios Tefas},
  booktitle={ECCV},
  year={2018}
}
Knowledge Transfer (KT) techniques tackle the problem of transferring the knowledge from a large and complex neural network into a smaller and faster one. However, existing KT methods are tailored towards classification tasks and they cannot be used efficiently for other representation learning tasks. In this paper we propose a novel probabilistic knowledge transfer method that works by matching the probability distribution of the data in the feature space instead of their actual representation… 
Probabilistic Knowledge Transfer for Lightweight Deep Representation Learning
Knowledge-transfer (KT) methods allow for transferring the knowledge contained in a large deep learning model into a more lightweight and faster model. However, the vast majority of existing KT
QUEST: Quantized embedding space for transferring knowledge
TLDR
This work proposes a novel way to achieve knowledge distillation: by distilling the knowledge through a quantized space, where the teacher's feature maps are quantized to represent the main visual concepts encompassed in the feature maps.
Knowledge Distillation By Sparse Representation Matching
TLDR
SRM is proposed, a method to transfer intermediate knowledge obtained from one Convolutional Neural Network to another by utilizing sparse representation learning, and is formulated as a neural processing block which can be efficiently optimized using stochastic gradient descent and integrated into any CNN in a plug-and-play manner.
Efficient Online Subclass Knowledge Distillation for Image Classification
TLDR
A novel single-stage self knowledge distillation method is proposed, namely Online Subclass Knowledge Distillation (OSKD), that aims at revealing the similarities inside classes, so as to improve the performance of any deep neural model in an online manner.
Similarity Transfer for Knowledge Distillation
TLDR
A novel method, called similarity transfer for knowledge distillation (STKD), which aims to fully utilize the similarities between categories of multiple samples and promotes the performance of student model as the virtual sample created by multiple images produces a similar probability distribution in the teacher and student networks.
InDistill: Transferring Knowledge From Pruned Intermediate Layers
TLDR
This paper proposes a novel method, termed InDistill, that can drastically improve the performance of existing single-layer knowledge distillation methods by leveraging the properties of channel pruning to both reduce the capacity gap between the models and retain the architectural alignment.
Heterogeneous Knowledge Distillation Using Information Flow Modeling
TLDR
This paper proposes a novel KD method that works by modeling the information flow through the various layers of the teacher model and then training a student model to mimic this information flow.
KNOWLEDGE DISTILLATION BY SPARSE REPRESEN-
  • Computer Science
  • 2020
TLDR
SRM is proposed, a method to transfer intermediate knowledge obtained from one Convolutional Neural Network to another by utilizing sparse representation learning, and is formulated as a neural processing block which can be efficiently optimized using stochastic gradient descent and integrated into any CNN in a plugand-play manner.
Local Region Knowledge Distillation
TLDR
Local linear region knowledge distillation (LRKD) is proposed which transfers the knowledge in local, liner regions from a teacher to a student and enforces the student to mimic the local shape of the teacher function in linear regions.
Contrastive Representation Distillation
TLDR
The resulting new objective outperforms knowledge distillation and other cutting-edge distillers on a variety of knowledge transfer tasks, including single model compression, ensemble distillation, and cross-modal transfer.
...
...

References

SHOWING 1-10 OF 54 REFERENCES
Unsupervised Knowledge Transfer Using Similarity Embeddings
  • N. Passalis, A. Tefas
  • Computer Science
    IEEE Transactions on Neural Networks and Learning Systems
  • 2019
TLDR
The proposed method is the first method that utilizes similarity-induced embeddings to transfer the knowledge between any two layers of neural networks, regardless of the number of neurons in each of them, and it is demonstrated that the knowledge of a neural network can be successfully transferred using different kinds of data.
Net2Net: Accelerating Learning via Knowledge Transfer
TLDR
The Net2Net technique accelerates the experimentation process by instantaneously transferring the knowledge from a previous network to each new deeper or wider network, and demonstrates a new state of the art accuracy rating on the ImageNet dataset.
A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning
TLDR
A novel technique for knowledge transfer, where knowledge from a pretrained deep neural network (DNN) is distilled and transferred to another DNN, which shows the student DNN that learns the distilled knowledge is optimized much faster than the original model and outperforms the original DNN.
Simultaneous Deep Transfer Across Domains and Tasks
TLDR
This work proposes a new CNN architecture to exploit unlabeled and sparsely labeled target domain data and simultaneously optimizes for domain invariance to facilitate domain transfer and uses a soft label distribution matching loss to transfer information between tasks.
Knowledge Transfer Pre-training
TLDR
This paper presents a new pre-training approach based on knowledge transfer learning which trains the entire model as a whole but with an easier objective function by utilizing soft targets produced by a prior trained model (teacher model).
Recurrent neural network training with dark knowledge transfer
TLDR
The knowledge transfer learning approach is employed to train RNNs (precisely LSTM) using a deep neural network (DNN) model as the teacher and it works fairly well: without applying any tricks on the learning scheme, this approach can train Rnns successfully even with limited training data.
DSOD: Learning Deeply Supervised Object Detectors from Scratch
TLDR
Deeply Supervised Object Detector (DSOD), a framework that can learn object detectors from scratch following the single-shot detection (SSD) framework, and one of the key findings is that deep supervision, enabled by dense layer-wise connections, plays a critical role in learning a good detector.
Transferring knowledge from a RNN to a DNN
TLDR
A state-of-the-art RNN model is used to generate soft alignments and minimize the Kullback-Leibler divergence against the small DNN, resulting in a significant improvement in results on the Wall Street Journal eval92 task.
Multimodal learning with deep Boltzmann machines
TLDR
A Deep Boltzmann Machine is proposed for learning a generative model of multimodal data and it is shown that the model can be used to create fused representations by combining features across modalities, which are useful for classification and information retrieval.
Distilling the Knowledge in a Neural Network
TLDR
This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.
...
...