• Corpus ID: 211096730

A Simple Framework for Contrastive Learning of Visual Representations

@article{Chen2020ASF,
  title={A Simple Framework for Contrastive Learning of Visual Representations},
  author={Ting Chen and Simon Kornblith and Mohammad Norouzi and Geoffrey E. Hinton},
  journal={ArXiv},
  year={2020},
  volume={abs/2002.05709}
}
This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive self-supervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framework. We show that (1) composition of data augmentations plays a critical role in defining effective… 
A Broad Study on the Transferability of Visual Representations with Contrastive Learning
TLDR
A comprehensive study on the transferability of learned representations of different contrastive approaches for linear evaluation, full-network transfer, and few-shot recognition on 12 downstream datasets from different domains, and object detection tasks on MSCOCO and VOC0712 shows that the contrasts learn representations that are easily transferable to a different downstream task.
CLAR: Contrastive Learning of Auditory Representations
TLDR
By combining all these methods and with substantially less labeled data, the CLAR framework achieves significant improvement on prediction performance compared to supervised approach and converges faster with significantly better representations.
G-SimCLR: Self-Supervised Contrastive Learning with Guided Projection via Pseudo Labelling
TLDR
This work proposes that, with the normalized temperature-scaled cross-entropy loss function (as used in SimCLR), it is beneficial to not have images of the same category in the same batch, and uses the latent space representation of a denoising autoencoder trained on the unlabeled dataset to obtain pseudo labels.
Multimodal Contrastive Training for Visual Representation Learning
TLDR
This work develops an approach to learning visual representations that embraces multimodal data, driven by a combination of intra- and inter-modal similarity preservation objectives, and exploits intrinsic data properties within each modality and semantic information from cross- modal correlation simultaneously, hence improving the quality of learned visual representations.
Contrastive Representation Learning: A Framework and Review
TLDR
A general Contrastive Representation Learning framework is proposed that simplifies and unifies many different contrastive learning methods and a taxonomy for each of the components is provided in order to summarise and distinguish it from other forms of machine learning.
Mutual Contrastive Learning for Visual Representation Learning
TLDR
Experimental results on image classification and transfer learning to object detection show that MCL can lead to consistent performance gains, demonstrating that M CL can guide the network to generate better feature representations.
What Should Not Be Contrastive in Contrastive Learning
TLDR
This work introduces a contrastive learning framework which does not require prior knowledge of specific, task-dependent invariances, and learns to capture varying and invariant factors for visual representations by constructing separate embedding spaces, each of which is invariant to all but one augmentation.
Efficient Visual Pretraining with Contrastive Detection
TLDR
This work introduces a new self-supervised objective, contrastive detection, which tasks representations with identifying object-level features across augmentations, leading to state-of-the-art transfer accuracy on a variety of downstream tasks, while requiring up to 10× less pretraining.
Heterogeneous Contrastive Learning: Encoding Spatial Information for Compact Visual Representations
TLDR
HCL is presented, an effective approach that adds spatial information to the encoding stage to alleviate the learning inconsistency between the contrastive objective and strong data augmentation operations and achieves higher efficiency in visual representations.
ImCLR: Implicit Contrastive Learning for Image Classification
TLDR
This work introduces a clever input construction for Implicit Contrastive Learning (ImCLR), primarily in the supervised setting: there, the network can implicitly learn to differentiate between similar and dissimilar images.
...
...

References

SHOWING 1-10 OF 68 REFERENCES
Revisiting Self-Supervised Visual Representation Learning
TLDR
This study revisits numerous previously proposed self-supervised models, conducts a thorough large scale study and uncovers multiple crucial insights about standard recipes for CNN design that do not always translate to self- supervised representation learning.
Momentum Contrast for Unsupervised Visual Representation Learning
We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a
Unsupervised Visual Representation Learning by Context Prediction
TLDR
It is demonstrated that the feature representation learned using this within-image context indeed captures visual similarity across images and allows us to perform unsupervised visual discovery of objects like cats, people, and even birds from the Pascal VOC 2011 detection dataset.
Unsupervised Representation Learning by Predicting Image Rotations
TLDR
This work proposes to learn image features by training ConvNets to recognize the 2d rotation that is applied to the image that it gets as input, and demonstrates both qualitatively and quantitatively that this apparently simple task actually provides a very powerful supervisory signal for semantic feature learning.
Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles
TLDR
A novel unsupervised learning approach to build features suitable for object detection and classification and to facilitate the transfer of features to other tasks, the context-free network (CFN), a siamese-ennead convolutional neural network is introduced.
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
TLDR
DeCAF, an open-source implementation of deep convolutional activation features, along with all associated network parameters, are released to enable vision researchers to be able to conduct experimentation with deep representations across a range of visual concept learning paradigms.
Multi-task Self-Supervised Visual Learning
TLDR
The results show that deeper networks work better, and that combining tasks—even via a na¨ýve multihead architecture—always improves performance.
Representation Learning with Contrastive Predictive Coding
TLDR
This work proposes a universal unsupervised learning approach to extract useful representations from high-dimensional data, which it calls Contrastive Predictive Coding, and demonstrates that the approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments.
Data-Efficient Image Recognition with Contrastive Predictive Coding
TLDR
This work revisit and improve Contrastive Predictive Coding, an unsupervised objective for learning such representations which make the variability in natural signals more predictable, and produces features which support state-of-the-art linear classification accuracy on the ImageNet dataset.
Unsupervised Feature Learning via Non-parametric Instance Discrimination
TLDR
This work forms this intuition as a non-parametric classification problem at the instance-level, and uses noise-contrastive estimation to tackle the computational challenges imposed by the large number of instance classes.
...
...