Corpus ID: 233025252

WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach

@inproceedings{Huang2021WhiteningBERTAE,
  title={WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach},
  author={Junjie Huang and Duyu Tang and Wanjun Zhong and Shuai Lu and Linjun Shou and Ming Gong and Daxin Jiang and Nan Duan},
  booktitle={EMNLP},
  year={2021}
}
Producing the embedding of a sentence in an unsupervised way is valuable to natural language matching and retrieval problems in practice. In this work, we conduct a thorough examination of pretrained model based unsupervised sentence embeddings. We study on four pretrained models and conduct massive experiments on seven datasets regarding sentence semantics. We have there main findings. First, averaging all tokens is better than only using [CLS] vector. Second, combining both top and bottom… Expand

Figures and Tables from this paper

Learning Kernel-Smoothed Machine Translation with Retrieved Examples
  • Qingnan Jiang, Mingxuan Wang, Jun Cao, Shanbo Cheng, Shujian Huang, Lei Li
  • Computer Science
  • EMNLP
  • 2021
TLDR
This work proposes to learn Kernel-Smoothed Translation with Example Retrieval (KSTER), an effective approach to adapt neural machine translation models online and shows that even without expensive retraining, KSTER is able to achieve improvement of 1.1 to 1.5 BLEU scores over the best existing online adaptation methods. Expand
SimCSE: Simple Contrastive Learning of Sentence Embeddings
TLDR
This paper describes an unsupervised approach, which takes an input sentence and predicts itself in a contrastive objective, with only standard dropout used as noise, and shows that contrastive learning theoretically regularizes pretrained embeddings’ anisotropic space to be more uniform and it better aligns positive pairs when supervised signals are available. Expand

References

SHOWING 1-10 OF 47 REFERENCES
Language-agnostic BERT Sentence Embedding
TLDR
This work adapts multilingual BERT to produce language-agnostic sentence embeddings for 109 languages that improve average bi-text retrieval accuracy over 112 languages to 83.7%, well above the 65.5% achieved by the prior state-of-the-art on Tatoeba. Expand
On the Sentence Embeddings from Pre-trained Language Models
TLDR
This paper proposes to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective and achieves significant performance gains over the state-of-the-art sentence embeddings on a variety of semantic textual similarity tasks. Expand
An Unsupervised Sentence Embedding Method by Mutual Information Maximization
TLDR
Experimental results show that the proposed lightweight extension on top of BERT significantly outperforms other unsupervised sentence embedding baselines on common semantic textual similarity (STS) tasks and downstream supervised tasks, and achieves performance competitive with supervised methods on various tasks. Expand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
RoBERTa: A Robustly Optimized BERT Pretraining Approach
TLDR
It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD. Expand
Sentencebert: Sentence embeddings using siamese bertnetworks
  • EMNLP/IJCNLP.
  • 2019
A Simple but Tough-to-Beat Baseline for Sentence Embeddings
Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks
TLDR
This work presents a simple yet efficient data augmentation strategy called Augmented SBERT, where the cross-encoder is used to label a larger set of input pairs to augment the training data for the bi-encoding, and shows that, in this process, selecting the sentence pairs is non-trivial and crucial for the success of the method. Expand
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
TLDR
A new model architecture DeBERTa (Decoding-enhanced BERT with disentangled attention) is proposed that improves the BERT and RoBERTa models using two novel techniques that significantly improve the efficiency of model pre-training and performance of downstream tasks. Expand
Whitening Sentence Representations for Better Semantics and Faster Retrieval
TLDR
The whitening operation in traditional machine learning can similarly enhance the isotropy of sentence representations and achieve competitive results, and the whitening technique is also capable of reducing the dimensionality of the sentence representation. Expand
...
1
2
3
4
5
...