• Corpus ID: 235727265

Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE

  title={Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE},
  author={Junya Chen and Zhe Gan and Xuan Li and Qing Guo and Liqun Chen and Shuyang Gao and Tagyoung Chung and Yi Xu and Belinda Zeng and Wenlian Lu and Fan Li and Lawrence Carin and Chenyang Tao},
InfoNCE -based contrastive representation learners, such as SimCLR [1] and MoCo [2], have been tremendously successful in recent years. However, these contrastive schemes are notoriously resource demanding, as their effectiveness breaks down with small-batch training ( i.e. , the log - K curse, whereas K is the batch-size). In this work, we reveal mathematically why contrastive learners fail in the small-batch-size regime, and present a novel simple, non-trivial contrastive objective named… 

Figures and Tables from this paper

Provable Stochastic Optimization for Global Contrastive Learning: Small Batch Does Not Harm Performance

This paper proposes a memory-efficient S tochastic O ptimization algorithm for solving the G lobal objective of C ontrastive L earning of R epresentations, named SogCLR, and demonstrates that it can achieve similar performance as SimCLR with large batch size on self-supervised learning task on ImageNet-1K.

Model-Aware Contrastive Learning: Towards Escaping Uniformity-Tolerance Dilemma in Training

A new CL loss is proposed in this work to improve the learned representations and training with small batch size and reexamine why contrastive learning requires a large number of negative samples in a unified gradient reduction perspective.

Contrastive Prototypical Network with Wasserstein Confidence Penalty

This work proposes Wasserstein Confidence Penalty which can impose appropriate penalty on overconfident predictions based on the semantic relationships among pseudo classes and achieves state-of-the-art performance on miniImageNet and tieredImageNet under unsupervised setting.

R\'enyiCL: Contrastive Representation Learning with Skew R\'enyi Divergence

This work proposes a novel contrastive objective that conducts variational estimation of a skew Rényi divergence and provides a theoretical guarantee on how variations of skew divergence leads to stable training, and shows that Rényu contrastive learning with stronger augmentations outperforms other self-supervised methods without extra regularization or computational overhead.

CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

The novel “Contrastive Leave One Out Boost” (CLOOB), which uses modern Hopfield networks for covariance enrichment together with the InfoLOOB objective to mitigate this saturation effect of the InfoNCE objective.

Multi-Level Contrastive Learning for Cross-Lingual Alignment

A multi-level contrastive learning (ML-CTL) framework to further improve the cross-lingual ability of pre-trained models and explicitly integrates the word-level information of each pair of parallel sentences into Contrastive learning.

Supercharging Imbalanced Data Learning With Energy-based Contrastive Representation Transfer

This work posits a meta-distributional scenario, where the causal generating mechanism for label-conditional features is invariant across different labels, which enables efficient knowledge transfer from the dominant classes to their under-represented counterparts, even if their feature distributions show apparent disparities.

Deploying self-supervised learning in the wild for hybrid automatic speech recognition

The experimental results show that SSL pre-training with in-domain uncurated data can achieve better performance in comparison to all the alternative out-domain pre- training strategies.

Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization

Theoretically, it is shown that the FLO estimator is tight, and it converges under stochastic gradient descent, which underscores the foundational importance of variational MI estimation in data-efficient learning.

Exploring Patch-wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks

A novel semantic relation consistency (SRC) regularization along with the decoupled contrastive learning, which utilize the diverse semantics by focusing on the heterogeneous semantics between the image patches of a single image.



SimCSE: Simple Contrastive Learning of Sentence Embeddings

SimCSE is presented, a simple contrastive learning framework that greatly advances the state-of-the-art sentence embeddings and regularizes pre-trainedembeddings’ anisotropic space to be more uniform, and it better aligns positive pairs when supervised signals are available.

Hard Negative Mixing for Contrastive Learning

It is argued that an important aspect of contrastive learning, i.e., the effect of hard negatives, has so far been neglected and proposed hard negative mixing strategies at the feature level, that can be computed on-the-fly with a minimal computational overhead.

Contrastive Learning with Hard Negative Samples

A new class of unsupervised methods for selecting hard negative samples where the user can control the amount of hardness are developed, improving downstream performance across multiple modalities, requires only few additional lines of code to implement, and introduces no computational overhead.

A Simple Framework for Contrastive Learning of Visual Representations

It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.

Debiased Contrastive Learning

A debiased contrastive objective is developed that corrects for the sampling of same-label datapoints, even without knowledge of the true labels, and consistently outperforms the state-of-the-art for representation learning in vision, language, and reinforcement learning benchmarks.

A Theoretical Analysis of Contrastive Unsupervised Representation Learning

This framework allows us to show provable guarantees on the performance of the learned representations on the average classification task that is comprised of a subset of the same set of latent classes and shows that learned representations can reduce (labeled) sample complexity on downstream tasks.

Learning word embeddings efficiently with noise-contrastive estimation

This work proposes a simple and scalable new approach to learning word embeddings based on training log-bilinear models with noise-contrastive estimation, and achieves results comparable to the best ones reported, using four times less data and more than an order of magnitude less computing time.

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

This paper empirically show that on the ImageNet dataset large minibatches cause optimization difficulties, but when these are addressed the trained networks exhibit good generalization and enable training visual recognition models on internet-scale data with high efficiency.

Importance Weighted Autoencoders

The importance weighted autoencoder (IWAE), a generative model with the same architecture as the VAE, but which uses a strictly tighter log-likelihood lower bound derived from importance weighting, shows empirically that IWAEs learn richer latent space representations than VAEs, leading to improved test log- likelihood on density estimation benchmarks.

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

This work investigates the cause for this generalization drop in the large-batch regime and presents numerical evidence that supports the view that large- batch methods tend to converge to sharp minimizers of the training and testing functions - and as is well known, sharp minima lead to poorer generalization.