Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup

@inproceedings{Gao2021ScalingDC,
  title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
  author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
  booktitle={REPL4NLP},
  year={2021}
}
Contrastive learning has been applied successfully to learn vector representations of text. Previous research demonstrated that learning high-quality representations benefits from batch-wise contrastive loss with a large number of negatives. In practice, the technique of in-batch negative is used, where for each example in a batch, other batch examples’ positives will be taken as its negatives, avoiding encoding extra negatives. This, however, still conditions each example’s loss on all batch… 

Figures and Tables from this paper

LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval
TLDR
This paper proposes LaPraDoR, a pretrained dual-tower dense retriever that does not require any supervised data for training, and proposes Lexicon-Enhanced Dense Retrieval (LEDR) as a simple yet effective way to enhance dense retrieval with lexical matching.
Tevatron: An Efficient and Flexible Toolkit for Dense Retrieval
TLDR
Tevatron is presented, a dense retrieval toolkit optimized for efficiency, flexibility, and code simplicity that provides a standardized pipeline for dense retrieval including text processing, model training, corpus/query encoding, and search.
CodeRetriever: Unimodal and Bimodal Contrastive Learning
TLDR
The CodeRetriever model, which combines the unimodal and bimodal contrastive learning to train functionlevel code semantic representations, specifically for the code search task, achieves the new state-ofthe-art performance with significant improvement over existing code pre-trained models.
Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval
Recent research demonstrates the effectiveness of using fine-tuned language models (LM) for dense retrieval. However, dense retrievers are hard to train, typically requiring heavily engineered
Condenser: a Pre-training Architecture for Dense Retrieval
TLDR
This paper proposes to pre-train towards dense encoder with a novel Transformer architecture, Condenser, where LM prediction CONditions on DENSE Representation improves over standard LM by large margins on various text retrieval and similarity tasks.
Reaching for the Sky: Maximizing Deep Learning Inference Throughput on Edge Devices with AI Multi-tenancy
The wide adoption of smart devices and Internet-of-Things (IoT) sensors has led to massive growth in data generation at the edge of the Internet over the past decade. Intelligent real-time analysis
Federated Momentum Contrastive Clustering
TLDR
FedMCC can easily be adapted to ordinary centralized clustering through what it is called momentum contrastive clustering (MCC), and it is shown that MCC achieves state-of-the-art clustering accuracy results in certain datasets such as STL-10 and ImageNet-10.
Conditional Supervised Contrastive Learning for Fair Text Classification
TLDR
This work theoretically analyze the connections between learning representations with fairness constraint and conditional supervised contrastive objectives, and proposes to use conditional supervised Contrastive objectives to learn fair representations for text classification via contrastive learning.
i-Code: An Integrative and Composable Multimodal Learning Framework
TLDR
Experimental results demonstrate how i-Code can outperform state-of-the-art techniques on video understanding tasks and the GLUE NLP benchmark, improving by as much as 11% and demonstrating the power of integrative multimodal pretraining.
C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval
TLDR
This work uses comparable Wikipedia articles in different languages to further pretrain off-the-shelf multilingual pretrained models before fine-tuning on the retrieval task, and shows that this approach yields improvements in retrieval effectiveness.
...
...

References

SHOWING 1-10 OF 17 REFERENCES
CLEAR: Contrastive Learning for Sentence Representation
TLDR
This paper proposes Contrastive LEArning for sentence Representation (CLEAR), which employs multiple sentence-level augmentation strategies in order to learn a noise-invariant sentence representation and investigates the key reasons that make contrastive learning effective through numerous experiments.
DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations
TLDR
Inspired by recent advances in deep metric learning (DML), this work carefully design a self-supervised objective for learning universal sentence embeddings that does not require labelled training data and closes the performance gap between unsupervised and supervised pretraining for universal sentence encoders.
Dense Passage Retrieval for Open-Domain Question Answering
TLDR
This work shows that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework.
Latent Retrieval for Weakly Supervised Open Domain Question Answering
TLDR
It is shown for the first time that it is possible to jointly learn the retriever and reader from question-answer string pairs and without any IR system, and outperforming BM25 by up to 19 points in exact match.
RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering
TLDR
This work proposes an optimized training approach, called RocketQA, to improving dense passage retrieval, which significantly outperforms previous state-of-the-art models on both MSMARCO and Natural Questions and demonstrates that the performance of end-to-end QA can be improved based on theRocketQA retriever.
Supervised Contrastive Learning
TLDR
A novel training methodology that consistently outperforms cross entropy on supervised learning tasks across different architectures and data augmentations is proposed, and the batch contrastive loss is modified, which has recently been shown to be very effective at learning powerful representations in the self-supervised setting.
A Simple Framework for Contrastive Learning of Visual Representations
TLDR
It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.
Pre-training Tasks for Embedding-based Large-scale Retrieval
TLDR
It is shown that the key ingredient of learning a strong embedding-based Transformer model is the set of pre- training tasks, and with adequately designed paragraph-level pre-training tasks, the Transformer models can remarkably improve over the widely-used BM-25 as well as embedding models without Transformers.
ZeRO: Memory optimizations Toward Training Trillion Parameter Models
TLDR
ZeRO eliminates memory redundancies in data- and model-parallel training while retaining low communication volume and high computational granularity, allowing us to scale the model size proportional to the number of devices with sustained high efficiency.
Natural Questions: A Benchmark for Question Answering Research
TLDR
The Natural Questions corpus, a question answering data set, is presented, introducing robust metrics for the purposes of evaluating question answering systems; demonstrating high human upper bounds on these metrics; and establishing baseline results using competitive methods drawn from related literature.
...
...