Corpus ID: 235727265

Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE

  title={Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE},
  author={Junya Chen and Zhe Gan and Xuan Li and Qing Guo and Liqun Chen and Shuyang Gao and Tagyoung Chung and Yi Xu and Belinda Zeng and Wenlian Lu and Fan Li and Lawrence Carin and Chenyang Tao},
InfoNCE-based contrastive representation learners, such as SimCLR [1], have been tremendously successful in recent years. However, these contrastive schemes are notoriously resource demanding, as their effectiveness breaks down with smallbatch training (i.e., the log-K curse, whereas K is the batch-size). In this work, we reveal mathematically why contrastive learners fail in the small-batch-size regime, and present a novel simple, non-trivial contrastive objective named FlatNCE, which fixes… Expand

Figures and Tables from this paper


SimCSE: Simple Contrastive Learning of Sentence Embeddings
This paper describes an unsupervised approach, which takes an input sentence and predicts itself in a contrastive objective, with only standard dropout used as noise, and shows that contrastive learning theoretically regularizes pretrained embeddings’ anisotropic space to be more uniform and it better aligns positive pairs when supervised signals are available. Expand
A Theoretical Analysis of Contrastive Unsupervised Representation Learning
This framework allows us to show provable guarantees on the performance of the learned representations on the average classification task that is comprised of a subset of the same set of latent classes and shows that learned representations can reduce (labeled) sample complexity on downstream tasks. Expand
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
This paper empirically show that on the ImageNet dataset large minibatches cause optimization difficulties, but when these are addressed the trained networks exhibit good generalization and enable training visual recognition models on internet-scale data with high efficiency. Expand
Learning word embeddings efficiently with noise-contrastive estimation
This work proposes a simple and scalable new approach to learning word embeddings based on training log-bilinear models with noise-contrastive estimation, and achieves results comparable to the best ones reported, using four times less data and more than an order of magnitude less computing time. Expand
Contrastive Representation Learning: A Framework and Review
A general Contrastive Representation Learning framework is proposed that simplifies and unifies many different contrastive learning methods and a taxonomy for each of the components is provided in order to summarise and distinguish it from other forms of machine learning. Expand
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
This work investigates the cause for this generalization drop in the large-batch regime and presents numerical evidence that supports the view that large- batch methods tend to converge to sharp minimizers of the training and testing functions - and as is well known, sharp minima lead to poorer generalization. Expand
Importance Weighted Autoencoders
The importance weighted autoencoder (IWAE), a generative model with the same architecture as the VAE, but which uses a strictly tighter log-likelihood lower bound derived from importance weighting, shows empirically that IWAEs learn richer latent space representations than VAEs, leading to improved test log- likelihood on density estimation benchmarks. Expand
Noise Contrastive Estimation and Negative Sampling for Conditional Models: Consistency and Statistical Efficiency
It is shown that the ranking-based variant of NCE gives consistent parameter estimates under weaker assumptions than the classification-based method, which is closely related to negative sampling methods, now widely used in NLP. Expand
Representation Learning with Contrastive Predictive Coding
This work proposes a universal unsupervised learning approach to extract useful representations from high-dimensional data, which it calls Contrastive Predictive Coding, and demonstrates that the approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments. Expand
Wasserstein Dependency Measure for Representation Learning
It is empirically demonstrated that mutual information-based representation learning approaches do fail to learn complete representations on a number of designed and real-world tasks, and a practical approximation to this theoretically motivated solution, constructed using Lipschitz constraint techniques from the GAN literature, achieves substantially improved results on tasks where incomplete representations are a major challenge. Expand