Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity is presented.
An easy and efficient method to extend existing sentence embedding models to new languages by using the original (monolingual) model to generate sentence embeddings for the source language and then training a new system on translated sentences to mimic the original model.
This paper evaluates the importance of different network design choices and hyperparameters for five common linguistic sequence tagging tasks and found, that some parameters, like the pre-trained word embeddings or the last layer of the network, have a large impact on the performance, while other parameters, for example the number of LSTM layers or theNumber of recurrent units, are of minor importance.
This work extensively analyzes different retrieval models and provides several suggestions that it believes may be useful for future work, finding that performing well consistently across all datasets is challenging.
It is shown that reporting a single performance score is insufficient to compare non-deterministic approaches and proposed to compare score distributions based on multiple executions, and network architectures are presented that produce both superior performance as well as are more stable with respect to the remaining hyperparameters.
For the first time, it is shown how to leverage the power of contextualized word embeddings to classify and cluster topic-dependent arguments, achieving impressive results on both tasks and across multiple datasets.
This work jointly model entity and event coreference, and proposes a neural architecture for cross-document coreference resolution using its lexical span, surrounding context, and relation to entity (event) mentions via predicate-arguments structures.
This work presents a simple yet efficient data augmentation strategy called Augmented SBERT, where the cross-encoder is used to label a larger set of input pairs to augment the training data for the bi-encoding, and shows that, in this process, selecting the sentence pairs is non-trivial and crucial for the success of the method.
This paper proposes a new annotation scheme to anchor events in time that is much lower as it scales linear with the number of events, and gives a more precise anchoring when the events have happened as the complete document can be taken into account.
This paper proposes AdapterDrop, removing adapters from lower transformer layers during training and inference, which incorporates concepts from all three directions and can dynamically reduce the computational overhead when performing inference over multiple tasks simultaneously, with minimal decrease in task performances.