LSX_team5 at SemEval-2022 Task 8: Multilingual News Article Similarity Assessment based on Word- and Sentence Mover’s Distance

  title={LSX\_team5 at SemEval-2022 Task 8: Multilingual News Article Similarity Assessment based on Word- and Sentence Mover’s Distance},
  author={Stefan Heil and Karina Kopp and Albin Zehe and Konstantin Kobs and Andreas Hotho},
  booktitle={International Workshop on Semantic Evaluation},
This paper introduces our submission for the SemEval 2022 Task 8: Multilingual News Article Similarity. The task of the competition consisted of the development of a model, capable of determining the similarity between pairs of multilingual news articles. To address this challenge, we evaluated the Word Mover’s Distance in conjunction with word embeddings from ConceptNet Numberbatch and term frequencies of WorldLex, as well the Sentence Mover’s Distance based on sentence embeddings generated by… 

Figures and Tables from this paper



ConceptNet at SemEval-2017 Task 2: Extending Word Embeddings with Multilingual Relational Knowledge

This paper describes Luminoso's participation in SemEval 2017 Task 2, "Multilingual and Cross-lingual Semantic Word Similarity", with a system based on ConceptNet, which took first place in both subtasks.

SemEval-2022 Task 8: Multilingual news article similarity

A new dataset of nearly 10,000 news article pairs spanning 18 language combinations annotated for seven dimensions of similarity as SemEval 2022 Task 8 is introduced, showing human annotators are capable of reaching higher correlations and suggesting space for further progress.

Sentence Mover’s Similarity: Automatic Evaluation for Multi-Sentence Texts

This work introduces methods based on sentence mover’s similarity, and finds that sentence-based metrics correlate with human judgments significantly better than ROUGE, both on machine-generated summaries and human-authored essays.

Facebook AI’s WMT21 News Translation Task Submission

It is described Facebook’s multilingual model submission to the WMT2021 shared task on news translation, an ensemble of dense and sparse Mixture-of-Expert multilingual translation models, followed by finetuning on in-domain news data and noisy channel reranking.

From Word Embeddings To Document Distances

It is demonstrated on eight real world document classification data sets, in comparison with seven state-of-the-art baselines, that the Word Mover's Distance metric leads to unprecedented low k-nearest neighbor document classification error rates.

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity is presented.

ConceptNet 5.5: An Open Multilingual Graph of General Knowledge

A new version of the linked open data resource ConceptNet is presented that is particularly well suited to be used with modern NLP techniques such as word embeddings, with state-of-the-art results on intrinsic evaluations of word relatedness that translate into improvements on applications of word vectors, including solving SAT-style analogies.

Efficient Estimation of Word Representations in Vector Space

Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.

Worldlex: Twitter and blog word frequencies for 66 languages

New frequencies based on Twitter, blog posts, or newspapers for 66 languages are presented, showing that these frequencies predict lexical decision reaction times similar to the already existing frequencies, or even better than them.

VII. Note on regression and inheritance in the case of two parents

  • K. Pearson
  • Mathematics
    Proceedings of the Royal Society of London
  • 1895
Consider a population in which sexual selection and natural selection may or may not be taking place. Assume only that the deviations from the mean in the case of any organ of any generation follow