Corpus ID: 235485473

Investigating the Role of Negatives in Contrastive Representation Learning

@article{Ash2021InvestigatingTR,
  title={Investigating the Role of Negatives in Contrastive Representation Learning},
  author={J. T. Ash and Surbhi Goel and A. Krishnamurthy and Dipendra Kumar Misra},
  journal={ArXiv},
  year={2021},
  volume={abs/2106.09943}
}
Noise contrastive learning is a popular technique for unsupervised representation learning. In this approach, a representation is obtained via reduction to supervised learning, where given a notion of semantic similarity, the learner tries to distinguish a similar (positive) example from a collection of random (negative) examples. The success of modern contrastive learning pipelines relies on many parameters such as the choice of data augmentation, the number of negative examples, and the batch… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 33 REFERENCES
A Theoretical Analysis of Contrastive Unsupervised Representation Learning
TLDR
This framework allows us to show provable guarantees on the performance of the learned representations on the average classification task that is comprised of a subset of the same set of latent classes and shows that learned representations can reduce (labeled) sample complexity on downstream tasks. Expand
Contrastive Representation Learning: A Framework and Review
TLDR
A general Contrastive Representation Learning framework is proposed that simplifies and unifies many different contrastive learning methods and a taxonomy for each of the components is provided in order to summarise and distinguish it from other forms of machine learning. Expand
LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding
TLDR
This work proposes a knowledge distillation method LRC-BERT based on contrastive learning to fit the output of the intermediate layer from the angular distance aspect, which is not considered by the existing distillation methods. Expand
On Mutual Information Maximization for Representation Learning
TLDR
This paper argues, and provides empirical evidence, that the success of these methods cannot be attributed to the properties of MI alone, and that they strongly depend on the inductive bias in both the choice of feature extractor architectures and the parametrization of the employed MI estimators. Expand
Contrastive Estimation: Training Log-Linear Models on Unlabeled Data
TLDR
A novel approach, contrastive estimation, is described, which outperforms EM, is more robust to degradations of the dictionary, and can largely recover by modeling additional features. Expand
Contrastive Learning of Structured World Models
TLDR
These experiments demonstrate that C-SWMs can overcome limitations of models based on pixel reconstruction and outperform typical representatives of this model class in highly structured environments, while learning interpretable object-based representations. Expand
Contrastive estimation reveals topic posterior information to linear models
TLDR
It is proved that contrastive learning is capable of recovering a representation of documents that reveals their underlying topic posterior information to linear models and empirically that linear classifiers with these representations perform well in document classification tasks with very few training examples. Expand
Representation Learning with Contrastive Predictive Coding
TLDR
This work proposes a universal unsupervised learning approach to extract useful representations from high-dimensional data, which it calls Contrastive Predictive Coding, and demonstrates that the approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments. Expand
Predicting What You Already Know Helps: Provable Self-Supervised Learning
TLDR
This paper quantifies how approximate independence between the components of the pretext task (conditional on the label and latent variables) allows us to learn representations that can solve the downstream task with drastically reduced sample complexity by just training a linear layer on top of the learned representation. Expand
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
TLDR
The contextual representations learned by the proposed replaced token detection pre-training task substantially outperform the ones learned by methods such as BERT and XLNet given the same model size, data, and compute. Expand
...
1
2
3
4
...