Share This Author
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity is presented.
Making Monolingual Sentence Embeddings Multilingual Using Knowledge Distillation
An easy and efficient method to extend existing sentence embedding models to new languages by using the original (monolingual) model to generate sentence embeddings for the source language and then training a new system on translated sentences to mimic the original model.
Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling Tasks
This paper evaluates the importance of different network design choices and hyperparameters for five common linguistic sequence tagging tasks and found, that some parameters, like the pre-trained word embeddings or the last layer of the network, have a large impact on the performance, while other parameters, for example the number of LSTM layers or theNumber of recurrent units, are of minor importance.
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models
- Nandan Thakur, Nils Reimers, Andreas Ruckl'e, Abhishek Srivastava, Iryna Gurevych
- Computer ScienceNeurIPS Datasets and Benchmarks
- 17 April 2021
This work extensively analyzes different retrieval models and provides several suggestions that it believes may be useful for future work, finding that performing well consistently across all datasets is challenging.
Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging
It is shown that reporting a single performance score is insufficient to compare non-deterministic approaches and proposed to compare score distributions based on multiple executions, and network architectures are presented that produce both superior performance as well as are more stable with respect to the remaining hyperparameters.
Classification and Clustering of Arguments with Contextualized Word Embeddings
- Nils Reimers, Benjamin Schiller, Tilman Beck, Johannes Daxenberger, Christian Stab, Iryna Gurevych
- Computer ScienceACL
- 27 May 2019
For the first time, it is shown how to leverage the power of contextualized word embeddings to classify and cluster topic-dependent arguments, achieving impressive results on both tasks and across multiple datasets.
Revisiting Joint Modeling of Cross-document Entity and Event Coreference Resolution
- Shany Barhom, Vered Shwartz, Alon Eirew, M. Bugert, Nils Reimers, Ido Dagan
- Computer ScienceACL
- 27 May 2019
This work jointly model entity and event coreference, and proposes a neural architecture for cross-document coreference resolution using its lexical span, surrounding context, and relation to entity (event) mentions via predicate-arguments structures.
Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks
- Nandan Thakur, Nils Reimers, Johannes Daxenberger, Iryna Gurevych
- Computer ScienceNAACL
- 16 October 2020
This work presents a simple yet efficient data augmentation strategy called Augmented SBERT, where the cross-encoder is used to label a larger set of input pairs to augment the training data for the bi-encoding, and shows that, in this process, selecting the sentence pairs is non-trivial and crucial for the success of the method.
Temporal Anchoring of Events for the TimeBank Corpus
This paper proposes a new annotation scheme to anchor events in time that is much lower as it scales linear with the number of events, and gives a more precise anchoring when the events have happened as the complete document can be taken into account.
AdapterDrop: On the Efficiency of Adapters in Transformers
This paper proposes AdapterDrop, removing adapters from lower transformer layers during training and inference, which incorporates concepts from all three directions and can dynamically reduce the computational overhead when performing inference over multiple tasks simultaneously, with minimal decrease in task performances.