Weighted Ensemble Self-Supervised Learning

  title={Weighted Ensemble Self-Supervised Learning},
  author={Yangjun Ruan and Saurabh Singh and Warren R. Morningstar and Alexander A. Alemi and Sergey Ioffe and Ian S. Fischer and Joshua V. Dillon},
Ensembling has proven to be a powerful technique for boosting model performance, uncertainty estimation, and robustness in supervised learning. Advances in self-supervised learning (SSL) enable leveraging large unlabeled corpora for state-of-the-art few-shot and supervised learning performance. In this paper, we explore how ensemble methods can improve recent SSL techniques by developing a framework that permits data-dependent weighted cross-entropy losses. We re-frain from ensembling the… 



Temporal Ensembling for Semi-Supervised Learning

Self-ensembling is introduced, where it is shown that this ensemble prediction can be expected to be a better predictor for the unknown labels than the output of the network at the most recent training epoch, and can thus be used as a target for training.

Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?

R E LICv2 is the first unsupervised representation learning method to consistently outperform a standard supervised baseline in a like-for-like comparison across a wide range of ResNet architectures and is comparable to state-of-the-art self-supervised vision transformers.

Why M Heads are Better than One: Training a Diverse Ensemble of Deep Networks

It is demonstrated that TreeNets can improve ensemble performance and that diverse ensembles can be trained end-to-end under a unified loss, achieving significantly higher "oracle" accuracies than classical ensembled.

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

The recently proposed Temporal Ensembling has achieved state-of-the-art results in several semi-supervised learning benchmarks, but it becomes unwieldy when learning large datasets, so Mean Teacher, a method that averages model weights instead of label predictions, is proposed.

Big Self-Supervised Models are Strong Semi-Supervised Learners

The proposed semi-supervised learning algorithm can be summarized in three steps: unsupervised pretraining of a big ResNet model using SimCLRv2 (a modification of SimCLRs), supervised fine-tuning on a few labeled examples, and distillation with unlabeled examples for refining and transferring the task-specific knowledge.

Snapshot Ensembles: Train 1, get M for free

This paper proposes a method to obtain the seemingly contradictory goal of ensembling multiple neural networks at no additional training cost by training a single neural network, converging to several local minima along its optimization path and saving the model parameters.

No One Representation to Rule Them All: Overlapping Features of Training Methods

A large-scale empirical study of models across hyper-parameters, architectures, frameworks, and datasets finds that model pairs that diverge more in training methodology display categorically different generalization behavior, producing increasingly uncorrelated errors.

Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

A rank-1 parameterization of BNNs is proposed, where each weight matrix involves only a distribution on aRank-1 subspace, and the use of mixture approximate posteriors to capture multiple modes is revisited.

BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning

BatchEnsemble is proposed, an ensemble method whose computational and memory costs are significantly lower than typical ensembles and can easily scale up to lifelong learning on Split-ImageNet which involves 100 sequential learning tasks.

Self-labelling via simultaneous clustering and representation learning

The proposed novel and principled learning formulation is able to self-label visual data so as to train highly competitive image representations without manual labels and yields the first self-supervised AlexNet that outperforms the supervised Pascal VOC detection baseline.