• Corpus ID: 44119099

Collaborative Learning for Deep Neural Networks

@article{Song2018CollaborativeLF,
  title={Collaborative Learning for Deep Neural Networks},
  author={Guocong Song and Wei Chai},
  journal={ArXiv},
  year={2018},
  volume={abs/1805.11761}
}
We introduce collaborative learning in which multiple classifier heads of the same network are simultaneously trained on the same training data to improve generalization and robustness to label noise with no extra inference cost. It acquires the strengths from auxiliary training, multi-task learning and knowledge distillation. There are two important mechanisms involved in collaborative learning. First, the consensus of multiple views from different classifier heads on the same example provides… 

Figures and Tables from this paper

Intra-Model Collaborative Learning of Neural Networks
  • Shijie FangTong Lin
  • Computer Science
    2021 International Joint Conference on Neural Networks (IJCNN)
  • 2021
TLDR
This paper proposes four ways of collaborative learning among different parts of a single network with negligible engineering efforts, and leverages the consistency of the output layer and intermediate layers for training under the collaborative learning framework to improve the robustness of the network.
Distilled Hierarchical Neural Ensembles with Adaptive Inference Cost
TLDR
HNE is proposed, a novel framework to embed an ensemble of multiple networks by sharing intermediate layers using a hierarchical structure and its second contribution is a novel co-distillation method to boost the performance of ensemble predictions with low inference cost.
Cooperative Learning for Noisy Supervision
TLDR
Cooperative Learning (CooL) frame-work for noisy supervision that analytically explains the benefits of leveraging dual or multiple networks yields a more reliable risk minimization for unseen clean data.
Deep learning with noisy supervision
TLDR
This dissertation explores the fundamental problems when training deep neural networks with noisy supervision and introduces a Latent Class-Conditional Noise model, which achieves the state-of-the-art results on two toy datasets and two large real-world datasets.
A robust approach for deep neural networks in presence of label noise: relabelling and filtering instances during training
TLDR
A robust training strategy against label noise, called RAFNI, that can be used with any CNN, and is compared with state-of-the-art models using the CIFAR10 and CIFar100 benchmarks and found that RAFNI achieves better results in most cases.
MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks
TLDR
It is shown that binary mixing in features - particularly with rectangular patches from CutMix - enhances results by making subnetworks stronger and more diverse, and opens a new line of research complementary to previous works.
Knowledge Transfer via Dense Cross-Layer Mutual-Distillation
TLDR
This paper proposes Dense Cross-layer Mutual-distillation (DCM), an improved two-way KT method in which the teacher and student networks are trained collaboratively from scratch, and introduces dense bidirectional KD operations between the layers appended with classifiers.
Deep Collaborative Learning for Randomly Wired Neural Networks
TLDR
The experimental results show that the collaborative training significantly improved the generalization of each model, which allowed for obtaining a small model that can mimic the performance of a large model and produce a more robust ensemble approach.
Self-Guidance: Improve Deep Neural Network Generalization via Knowledge Distillation
  • Zhenzhu ZhengX. Peng
  • Computer Science
    2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
  • 2022
TLDR
The basic idea is to train sub-network to match the prediction of the full network, so-called "Self-Guidance", under the "teacher-student" framework, which improves the generalization ability of deep neural networks to a significant margin.
All at Once Network Quantization via Collaborative Knowledge Transfer
TLDR
A novel collaborative knowledge transfer approach for efficiently training the all-at-once quantization network by proposing an adaptive selection strategy to choose a high-precision “teacher” for transferring knowledge to the low-pre precision “student” while jointly optimizing the model with all bit-widths.
...
...

References

SHOWING 1-10 OF 27 REFERENCES
Deep Mutual Learning
TLDR
Surprisingly, it is revealed that no prior powerful teacher network is necessary - mutual learning of a collection of simple student networks works, and moreover outperforms distillation from a more powerful yet static teacher.
Temporal Ensembling for Semi-Supervised Learning
TLDR
Self-ensembling is introduced, where it is shown that this ensemble prediction can be expected to be a better predictor for the unknown labels than the output of the network at the most recent training epoch, and can thus be used as a target for training.
FitNets: Hints for Thin Deep Nets
TLDR
This paper extends the idea of a student network that could imitate the soft output of a larger teacher network or ensemble of networks, using not only the outputs but also the intermediate representations learned by the teacher as hints to improve the training process and final performance of the student.
Born Again Neural Networks
TLDR
This work studies KD from a new perspective: rather than compressing models, students are trained parameterized identically to their teachers, and shows significant advantages from transferring knowledge between DenseNets and ResNets in either direction.
Large scale distributed neural network training through online distillation
TLDR
This paper claims that online distillation is a cost-effective way to make the exact predictions of a model dramatically more reproducible and can still speed up training even after the authors have already reached the point at which additional parallelism provides no benefit for synchronous or asynchronous stochastic gradient descent.
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Deep Multi-task Representation Learning: A Tensor Factorisation Approach
TLDR
A new deep multi-task representation learning framework that learns cross-task sharing structure at every layer in a deep network by generalising the matrix factorisation techniques explicitly or implicitly used by many conventional MTL algorithms to tensor factorisation to realise automatic learning of end-to-end knowledge sharing in deep networks.
Identity Mappings in Deep Residual Networks
TLDR
The propagation formulations behind the residual building blocks suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation.
Distilling the Knowledge in a Neural Network
TLDR
This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.
mixup: Beyond Empirical Risk Minimization
TLDR
This work proposes mixup, a simple learning principle that trains a neural network on convex combinations of pairs of examples and their labels, which improves the generalization of state-of-the-art neural network architectures.
...
...