DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations

@article{Giorgi2021DeCLUTRDC,
  title={DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations},
  author={John Giorgi and Osvald Nitski and Gary D Bader and Bo Wang},
  journal={ArXiv},
  year={2021},
  volume={abs/2006.03659}
}
Sentence embeddings are an important component of many natural language processing (NLP) systems. Like word embeddings, sentence embeddings are typically learned on large text corpora and then transferred to various downstream tasks, such as clustering and retrieval. Unlike word embeddings, the highest performing solutions for learning sentence embeddings require labelled data, limiting their usefulness to languages and domains where labelled data is abundant. In this paper, we present DeCLUTR… 

Figures and Tables from this paper

AugCSE: Contrastive Sentence Embedding with Diverse Augmentations

AugCSE is presented, a unified framework to utilize diverse sets of data augmentations to achieve a better, general-purpose, sentence embedding model, and shows that diverse augmentations can be tamed to produce a better and more robust sentence representation.

Contrastive Pre-training of Spatial-Temporal Trajectory Embeddings

A novel Contrastive Spatial-Temporal Trajectory Embedding (CSTTE) model is proposed, which adopts the contrastive learning framework so that its pretext task is robust to noise and comprehensively model the long-term spatial-temporal correlations in trajectories.

EASE: Entity-Aware Contrastive Learning of Sentence Embedding

It is shown that EASE exhibits competitive or better performance in English semantic textual similarity (STS) and short text clustering (STC) tasks and it significantly outperforms baseline methods in multilingual settings on a variety of tasks.

DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings

This work proposes DiffCSE, an unsupervised contrastive learning framework for learning sentence embeddings that are sensitive to the difference between the original sentence and an edited sentence, and shows that DiffSCE is an instance of equivariant Contrastive learning, which generalizes contrastivelearning and learns representations that are insensitive to certain types of augmentations and sensitive to other “harmful” types of augations.

Text Transformations in Contrastive Self-Supervised Learning: A Review

The contrastive learning framework is formalized, the considerations that need to be addressed in the data transformation step are emphasized, and the state-of-the-art methods and evaluations for contrastive representation learning in NLP are reviewed.

A Mutually Reinforced Framework for Pretrained Sentence Embeddings

This work proposes a novel framework InfoCSE1, which leverages the sentence representation model itself and realizes the following iterative self-supervision process: on one hand, the improvement of sentence representation may contribute to the quality of data annotation; on the other hand, more effective data annotation helps to generate high-quality positive samples, which will further improve the current sentence representation models.

Exploring the Impact of Negative Samples of Contrastive Learning: A Case Study of Sentence Embedding

A momentum contrastive learning model with negative sample queue for sentence embedding, namely MoCoSE is presented, adding the prediction layer to the online branch to make the model asymmetric and together with EMA update mechanism of the target branch to prevent the model from collapsing.

Text and Code Embeddings by Contrastive Pre-Training

It is shown that contrastive pre-training on unsupervised data at scale leads to high quality vector representations of text and code.

Efficient Contrastive Learning via Novel Data Augmentation and Curriculum Learning

EfficientCL, a memory-efficient continual pretraining method that applies contrastive learning with novel data augmentation and curriculum learning, and is capable with only 70% of computational memory compared to the baseline model.

Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation

This work proposes a data-efficient contrastive distillation method that uses soft labels to learn from noisy image-text pairs and exceeds the previous SoTA of general zero-shot learning on ImageNet 21k+1k by 73% relatively with a ResNet50 image encoder and DeCLUTR text encoder.
...

References

SHOWING 1-10 OF 96 REFERENCES

SentEval: An Evaluation Toolkit for Universal Sentence Representations

We introduce SentEval, a toolkit for evaluating the quality of universal sentence representations. SentEval encompasses a variety of tasks, including binary and multi-class classification, natural

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity is presented.

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

It is shown how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets can consistently outperform unsupervised methods like SkipThought vectors on a wide range of transfer tasks.

A large annotated corpus for learning natural language inference

The Stanford Natural Language Inference corpus is introduced, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning, which allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.

SentenceBERT: Sentence embeddings using Siamese BERTnetworks

  • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natu-
  • 2019

RoBERTa: A Robustly Optimized BERT Pretraining Approach

It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

SpanBERT: Improving Pre-training by Representing and Predicting Spans

The approach extends BERT by masking contiguous random spans, rather than random tokens, and training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it.

Reducing BERT Pre-Training Time from 3 Days to 76 Minutes

The LAMB optimizer is proposed, which helps to scale the batch size to 65536 without losing accuracy, and is a general optimizer that works for both small and large batch sizes and does not need hyper-parameter tuning besides the learning rate.

Cross-lingual Language Model Pretraining

This work proposes two methods to learn cross-lingual language models (XLMs): one unsupervised that only relies on monolingual data, and one supervised that leverages parallel data with a new cross-lingsual language model objective.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
...