Collaborative Group Learning

  title={Collaborative Group Learning},
  author={Shaoxiong Feng and Hongshen Chen and Xuancheng Ren and Zhuoye Ding and Kan Li and Xu Sun},
Collaborative learning has successfully applied knowledge transfer to guide a pool of small student networks towards robust local minima. However, previous approaches typically struggle with drastically aggravated student homogenization when the number of students rises. In this paper, we propose Collaborative Group Learning, an efficient framework that aims to diversify the feature representation and conduct an effective regularization. Intuitively, similar to the human group study mechanism… 

Figures and Tables from this paper

Decentralized Federated Learning via Mutual Knowledge Transfer

The proposed Def-KT algorithm significantly outperforms the baseline DFL methods with model averaging, i.e., Combo and FullAvg, especially when the training data are not independent and identically distributed (non-IID) across different clients.

CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing

This work proposes a consistency-regularized ensemble learning approach based on perturbed models, named CAMERO, which shares the weights of bottom layers across all models and applies different perturbations to the hidden representations for different models to effectively promote the model diversity.



Online Knowledge Distillation with Diverse Peers

Experimental results show that the proposed framework consistently gives better performance than state-of-the-art approaches without sacrificing training or inference complexity, demonstrating the effectiveness of the proposed two-level distillation framework.

Deep Mutual Learning

Surprisingly, it is revealed that no prior powerful teacher network is necessary - mutual learning of a collection of simple student networks works, and moreover outperforms distillation from a more powerful yet static teacher.

Collaborative Learning for Deep Neural Networks

The empirical results on CIFAR and ImageNet datasets demonstrate that deep neural networks learned as a group in a collaborative way significantly reduce the generalization error and increase the robustness to label noise.

FitNets: Hints for Thin Deep Nets

This paper extends the idea of a student network that could imitate the soft output of a larger teacher network or ensemble of networks, using not only the outputs but also the intermediate representations learned by the teacher as hints to improve the training process and final performance of the student.

Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data

Generative Teaching Networks may represent a first step toward the ambitious goal of algorithms that generate their own training data and, in doing so, open a variety of interesting new research questions and directions.

A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning

A novel technique for knowledge transfer, where knowledge from a pretrained deep neural network (DNN) is distilled and transferred to another DNN, which shows the student DNN that learns the distilled knowledge is optimized much faster than the original model and outperforms the original DNN.

Variational Information Distillation for Knowledge Transfer

An information-theoretic framework for knowledge transfer is proposed which formulates knowledge transfer as maximizing the mutual information between the teacher and the student networks and which consistently outperforms existing methods.

Knowledge Distillation by On-the-Fly Native Ensemble

This work presents an On-the-fly Native Ensemble strategy for one-stage online distillation that improves the generalisation performance a variety of deep neural networks more significantly than alternative methods on four image classification dataset.

Convergent Learning: Do different neural networks learn the same representations?

This paper investigates the extent to which neural networks exhibit convergent learning, which is when the representations learned by multiple nets converge to a set of features which are either individually similar between networks or where subsets of features span similar low-dimensional spaces.

Random Path Selection for Continual Learning

This paper proposes a random path selection algorithm, called RPS-Net, that progressively chooses optimal paths for the new tasks while encouraging parameter sharing and reuse and proposes a simple controller to dynamically balance the model plasticity.