Solving Inefficiency of Self-supervised Representation Learning

  title={Solving Inefficiency of Self-supervised Representation Learning},
  author={Guangrun Wang and Keze Wang and Guangcong Wang and Phillip Torr and Liang Lin},
  journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)},
Self-supervised learning (especially contrastive learning) has attracted great interest due to its huge potential in learning discriminative representations in an unsupervised manner. Despite the acknowledged successes, existing contrastive learning methods suffer from very low learning efficiency, e.g., taking about ten times more training epochs than supervised learning for comparable recognition accuracy. In this paper, we reveal two contradictory phenomena in contrastive learning that we… 

Figures and Tables from this paper

Similarity Contrastive Estimation for Self-Supervised Soft Contrastive Learning

This work proposes a novel formulation of contrastive learning using semantic similarity between instances called Similarity Contrastive Estimation (SCE), which estimates from one view of a batch a continuous distribution to push or pull instances based on their semantic similarities.

Relational Self-Supervised Learning

This paper introduces a novel SSL paradigm, which is term as relational self-supervised learning (ReSSL) framework that learns representations by modeling the relationship between different instances by employing sharpened distribution of pairwise similarities among different instances as relation metric.

Joint Debiased Representation and Image Clustering Learning with Self-Supervision

A novel joint clustering and contrastive learning framework is developed by adapting the debiased contrastive loss to avoid under-clustering minority classes of imbalanced datasets and improves the performance across multiple datasets and learning tasks.

Improving Transferability of Representations via Augmentation-Aware Self-Supervision

An auxiliary self-supervised loss is suggested, coined AugSelf, that learns the difference of augmentation parameters between two randomly augmented samples and encourages to preserve augmentation-aware information in learned representations, which could be beneficial for their transferability.

Multiple Instance Learning via Iterative Self-Paced Supervised Contrastive Learning

A novel framework is proposed, Iterative Self-paced Supervised Contrastive Learning for MIL Representations (ItS2CLR), which improves the learned representation by exploiting instance-level pseudo labels derived from the bag-level labels.

On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals

The validated techniques are combined and improve the baseline performances of five small architectures with considerable margins, which indicates that training small self-supervised contrastive models is feasible even without distillation signals.

Constrained Mean Shift Using Distant Yet Related Neighbors for Representation Learning

This work generalizes the mean-shift idea by constraining the search space of NNs using another source of knowledge so that NNs are far from the query while still being semantically related.

CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations

A new pre-training strategy named ccc-wav2vec 2.0 is presented, which uses clustering and an augmentation-based cross-contrastive loss as its self-supervised objective, bringing robustness to the pre- training strategy.

A Generic Self-Supervised Framework of Learning Invariant Discriminative Features

A generic SSL framework based on a constrained self-labelling assignment process that prevents degenerate solutions is proposed, and the proposed training strategy outperforms a majority of state-of-the-art representation learning methods based on AE structures.

Negative Selection by Clustering for Contrastive Learning in Human Activity Recognition

A new contrastive learning framework that negative selection by clustering in HAR is proposed, which is called ClusterCLHAR, and it redefines the negative pairs in the contrastive loss function by using unsupervised clustering methods to generate soft labels that mask other samples of the same cluster to avoid regarding them as negative samples.



Self-labelling via simultaneous clustering and representation learning

The proposed novel and principled learning formulation is able to self-label visual data so as to train highly competitive image representations without manual labels and yields the first self-supervised AlexNet that outperforms the supervised Pascal VOC detection baseline.

Hierarchical Semantic Aggregation for Contrastive Representation Learning

This paper tackles the representation inefficiency of contrastive learning and proposes a hierarchical training strategy to explicitly model the invariance to semantic similar images in a bottom-up way and produces more discriminative representation on several unsupervised benchmarks.

Boosting Contrastive Self-Supervised Learning with False Negative Cancellation

This paper proposes novel approaches to identify false negatives, as well as two strategies to mitigate their effect, i.e. false negative elimination and attraction, while systematically performing rigorous evaluations to study this problem in detail.

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

This paper proposes an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons, and uses a swapped prediction mechanism where it predicts the cluster assignment of a view from the representation of another view.

What makes for good views for contrastive learning

This paper uses empirical analysis to better understand the importance of view selection, and argues that the mutual information (MI) between views should be reduced while keeping task-relevant information intact, and devise unsupervised and semi-supervised frameworks that learn effective views by aiming to reduce their MI.

Representation Learning with Contrastive Predictive Coding

This work proposes a universal unsupervised learning approach to extract useful representations from high-dimensional data, which it calls Contrastive Predictive Coding, and demonstrates that the approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments.

Debiased Contrastive Learning

A debiased contrastive objective is developed that corrects for the sampling of same-label datapoints, even without knowledge of the true labels, and consistently outperforms the state-of-the-art for representation learning in vision, language, and reinforcement learning benchmarks.

A Simple Framework for Contrastive Learning of Visual Representations

It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.

Learning Representations by Predicting Bags of Visual Words

This work shows that the process of image discretization into visual words can provide the basis for very powerful self-supervised approaches in the image domain, thus allowing further connections to be made to related methods from the NLP domain that have been extremely successful so far.

Are all negatives created equal in contrastive instance discrimination?

It is found that negatives vary in importance and that CID may benefit from more intelligent negative treatment, and that some negatives were more consistently easy or hard than the authors would expect by chance.