Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup

@inproceedings{Gao2021ScalingDC,
  title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
  author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
  booktitle={REPL4NLP},
  year={2021}
}
Contrastive learning has been applied successfully to learn vector representations of text. Previous research demonstrated that learning high-quality representations benefits from batch-wise contrastive loss with a large number of negatives. In practice, the technique of in-batch negative is used, where for each example in a batch, other batch examples’ positives will be taken as its negatives, avoiding encoding extra negatives. This, however, still conditions each example’s loss on all batch… 

Figures and Tables from this paper

Contrastive Data and Learning for Natural Language Processing
TLDR
This tutorial intends to help researchers in the NLP and computational linguistics community to understand this emerging topic and promote future research directions of using contrastive learning for NLP applications.
LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval
TLDR
This paper proposes LaPraDoR, a pretrained dual-tower dense retriever that does not require any supervised data for training, and proposes Lexicon-Enhanced Dense Retrieval (LEDR) as a simple yet effective way to enhance dense retrieval with lexical matching.
Tevatron: An Efficient and Flexible Toolkit for Dense Retrieval
TLDR
Tevatron is presented, a dense retrieval toolkit optimized for efficiency, flexibility, and code simplicity that provides a standardized pipeline for dense retrieval including text processing, model training, corpus/query encoding, and search.
CodeRetriever: Unimodal and Bimodal Contrastive Learning
TLDR
The CodeRetriever model, which combines the unimodal and bimodal contrastive learning to train functionlevel code semantic representations, specifically for the code search task, achieves the new state-ofthe-art performance with significant improvement over existing code pre-trained models.
Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval
Recent research demonstrates the effectiveness of using fine-tuned language models (LM) for dense retrieval. However, dense retrievers are hard to train, typically requiring heavily engineered
Condenser: a Pre-training Architecture for Dense Retrieval
TLDR
This paper proposes to pre-train towards dense encoder with a novel Transformer architecture, Condenser, where LM prediction CONditions on DENSE Representation improves over standard LM by large margins on various text retrieval and similarity tasks.
TSPLIT: Fine-grained GPU Memory Management for Efficient DNN Training via Tensor Splitting
TLDR
TSPLIT is a fine-grained DNN memory management system that breaks apart memory bottlenecks while maintaining the efficiency of DNNs training by proposing a model-guided approach to holistically exploit the tensor-split and its joint optimization with out-of-core execution methods (via offload and recompute).
Reaching for the Sky: Maximizing Deep Learning Inference Throughput on Edge Devices with AI Multi-tenancy
TLDR
This study investigates system techniques, such as batched inferencing, AI multi-tenancy, and cluster of AI accelerators, which can significantly enhance the overall inference throughput on edge devices with DL models for image classification tasks.
Federated Momentum Contrastive Clustering
TLDR
FedMCC can easily be adapted to ordinary centralized clustering through what it is called momentum contrastive clustering (MCC), and it is shown that MCC achieves state-of-the-art clustering accuracy results in certain datasets such as STL-10 and ImageNet-10.
Conditional Supervised Contrastive Learning for Fair Text Classification
TLDR
This work theoretically analyze the connections between learning representations with fairness constraint and conditional supervised contrastive objectives, and proposes to use conditional supervised Contrastive objectives to learn fair representations for text classification via contrastive learning.
...
...

References

SHOWING 1-10 OF 17 REFERENCES
CLEAR: Contrastive Learning for Sentence Representation
TLDR
This paper proposes Contrastive LEArning for sentence Representation (CLEAR), which employs multiple sentence-level augmentation strategies in order to learn a noise-invariant sentence representation and investigates the key reasons that make contrastive learning effective through numerous experiments.
DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations
TLDR
Inspired by recent advances in deep metric learning (DML), this work carefully design a self-supervised objective for learning universal sentence embeddings that does not require labelled training data and closes the performance gap between unsupervised and supervised pretraining for universal sentence encoders.
Dense Passage Retrieval for Open-Domain Question Answering
TLDR
This work shows that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework.
Latent Retrieval for Weakly Supervised Open Domain Question Answering
TLDR
It is shown for the first time that it is possible to jointly learn the retriever and reader from question-answer string pairs and without any IR system, and outperforming BM25 by up to 19 points in exact match.
RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering
TLDR
This work proposes an optimized training approach, called RocketQA, to improving dense passage retrieval, which significantly outperforms previous state-of-the-art models on both MSMARCO and Natural Questions and demonstrates that the performance of end-to-end QA can be improved based on theRocketQA retriever.
Supervised Contrastive Learning
TLDR
A novel training methodology that consistently outperforms cross entropy on supervised learning tasks across different architectures and data augmentations is proposed, and the batch contrastive loss is modified, which has recently been shown to be very effective at learning powerful representations in the self-supervised setting.
A Simple Framework for Contrastive Learning of Visual Representations
TLDR
It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.
Pre-training Tasks for Embedding-based Large-scale Retrieval
TLDR
It is shown that the key ingredient of learning a strong embedding-based Transformer model is the set of pre- training tasks, and with adequately designed paragraph-level pre-training tasks, the Transformer models can remarkably improve over the widely-used BM-25 as well as embedding models without Transformers.
ZeRO: Memory optimizations Toward Training Trillion Parameter Models
TLDR
ZeRO eliminates memory redundancies in data- and model-parallel training while retaining low communication volume and high computational granularity, allowing us to scale the model size proportional to the number of devices with sustained high efficiency.
Natural Questions: A Benchmark for Question Answering Research
TLDR
The Natural Questions corpus, a question answering data set, is presented, introducing robust metrics for the purposes of evaluating question answering systems; demonstrating high human upper bounds on these metrics; and establishing baseline results using competitive methods drawn from related literature.
...
...