Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

  title={Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks},
  author={Nils Reimers and Iryna Gurevych},
BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019) has set a new state-of-the-art performance on sentence-pair regression tasks like semantic textual similarity (STS. [] Key ResultWe evaluate SBERT and SRoBERTa on common STS tasks and transfer learning tasks, where it outperforms other state-of-the-art sentence embeddings methods.

Figures and Tables from this paper

Evaluation of BERT and ALBERT Sentence Embedding Performance on Downstream NLP Tasks
This paper takes a modified BERT network with siamese and triplet network structures and replaces BERT with ALBERT to create Sentence-ALberT (SALBERT), and evaluates performances of all sentence-embedding models considered using the STS and NLI datasets.
Dual-View Distilled BERT for Sentence Embedding
Experiments on six STS tasks show that the proposed Dual-view distilled BERT~(DvBERT) for sentence matching with sentence embeddings outperforms the state-of-the-art sentence embedding methods.
SGPT: GPT Sentence Embeddings for Semantic Search
SGPT-BE and SGPT-CE are presented for apply-ing GPT models as Bi-encoders or Cross-Encoders to symmetric or asymmetric search, and performance scales with model size.
SBERT-WK: A Sentence Embedding Method by Dissecting BERT-Based Word Models
  • Bin Wang, C.-C. Jay Kuo
  • Computer Science
    IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2020
This work proposes a new sentence embedding method by dissecting BERT-based word models through geometric analysis of the space spanned by the word representation, called SBERT-WK, which achieves the state-of-the-art performance.
SimCSE: Simple Contrastive Learning of Sentence Embeddings
SimCSE is presented, a simple contrastive learning framework that greatly advances the state-of-the-art sentence embeddings and regularizes pre-trainedembeddings’ anisotropic space to be more uniform, and it better aligns positive pairs when supervised signals are available.
ASBERT: Siamese and Triplet network embedding for open question answering
This work presents ASBERT, a framework built on the BERT architecture that employs Siamese and Triplet neural networks to learn an encoding function that maps a text to a fixed-size vector in an embedded space.
BERT-QPP: Contextualized Pre-trained transformers for Query Performance Prediction
This work proposes to directly fine-tune a contextual embedding, i.e., BERT, specifically for the task of query performance prediction, and shows significant improved prediction performance compared to all the state-of-the-art methods.
Efficient comparison of sentence embeddings
A sentence embedding algorithm, BERT, is selected as the algorithm of choice and the performance of two vector comparison approaches, FAISS and Elasticsearch, in the problem of sentence embeddings is evaluated.
Utilizing SBERT For Finding Similar Questions in Community Question Answering
SBERT model for question retrieval in Community Question Answering reduces the effort for finding the most similar question from 795 seconds with BERT to about 0.828 seconds with SBERT, while maintaining the accuracy from BERT.
BURT: BERT-inspired Universal Representation from Twin Structure
This work presents BURT (BERT inspired Universal Representation from Twin Structure) that is capable of generating universal, fixed-size representations for input sequences of any granularity, using a large scale of natural language inference and paraphrase data with multiple training objectives.


BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Real-time Inference in Multi-sentence Tasks with Deep Pretrained Transformers
A new architecture, the Poly-encoder, is developed that is designed to approach the performance of the Cross-encoders while maintaining reasonable computation time and achieves state-of-the-art results on both dialogue tasks.
BERTScore: Evaluating Text Generation with BERT
This work proposes BERTScore, an automatic evaluation metric for text generation that correlates better with human judgments and provides stronger model selection performance than existing metrics.
XLNet: Generalized Autoregressive Pretraining for Language Understanding
XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation.
Skip-Thought Vectors
We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the
Attention is All you Need
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Universal Sentence Encoder
It is found that transfer learning using sentence embeddings tends to outperform word level transfer with surprisingly good performance with minimal amounts of supervised training data for a transfer task.
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.
Learning Thematic Similarity Metric from Article Sections Using Triplet Networks
It is suggested to leverage the partition of articles into sections, in order to learn thematic similarity metric between sentences, and shows that the triplet network learns useful thematic metrics, that significantly outperform state-of-the-art semantic similarity methods and multipurpose embeddings on the task of thematic clustering of sentences.
SemEval-2014 Task 10: Multilingual Semantic Textual Similarity
This year, the participants were challenged with new data sets for English, as well as the introduction of Spanish, as a new language in which to assess semantic similarity, and the annotations for both tasks leveraged crowdsourcing.