Building an Icelandic Entity Linking Corpus

@article{Fririksdttir2022BuildingAI,
  title={Building an Icelandic Entity Linking Corpus},
  author={Steinunn Rut Friðriksd{\'o}ttir and Valdimar 'Ag'ust Eggertsson and Benedikt Geir J'ohannesson and Hjalti Dan{\'i}elsson and Hrafn Loftsson and Hafsteinn Þ{\'o}r Einarsson},
  journal={ArXiv},
  year={2022},
  volume={abs/2206.05014}
}
In this paper, we present the first Entity Linking corpus for Icelandic. We describe our approach of using a multilingual entity linking model (mGENRE) in combination with Wikipedia API Search (WAPIS) to label our data and compare it to an approach using WAPIS only. We find that our combined method reaches 53.9% coverage on our corpus, compared to 30.9% using only WAPIS. We analyze our results and explain the value of using a multilingual system when working with Icelandic. Additionally, we… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 19 REFERENCES

Developing a PoS-tagged corpus using existing tools

The development of a new tagged corpus of Icelandic, consisting of about 1 million tokens, is described, to use the corpus as a new gold standard for training and testing PoS taggers and to discuss what problems have emerged, and highlight which software tools have been found to be useful.

Entity Linking in 100 Languages

A new formulation for multilingual entity linking, where language-specific mentions resolve to a language-agnostic Knowledge Base is proposed, where the model outperforms state-of-the-art results from a far more limited cross-lingual linking task.

Named Entity Recognition for Icelandic: Annotated Corpus and Models

This work has created the first NER corpus for Icelandic by annotating 48,371 named entities (NEs) using eight NE types, in a text corpus of 1 million tokens, to train three machine learning models.

Cross-Language Entity Linking

A new test collection is created to evaluate cross-language entity linking performance in twenty-one languages and presents experiments that examine issues such as: the importance of transliteration; the utility of cross- language information retrieval; and, the potential benefit of multilingual named entity recognition.

Reddit Entity Linking Dataset

Building a Cross-Language Entity Linking Collection in Twenty-One Languages

An efficient way to create a test collection for evaluating the accuracy of cross-language entity linking is described, which includes approximately 55,000 queries, comprising between 875 and 4,329 queries for each of twenty-one non-English languages.

Overview of TAC-KBP2015 Tri-lingual Entity Discovery and Linking

An overview of the task definition, annotation issues, successful methods and research challenges associated with this new end-to-end Tri-lingual entity discovery and linking task at the Knowledge Base Population (KBP) track at TAC2015 is given.

Robust Disambiguation of Named Entities in Text

A robust method for collective disambiguation is presented, by harnessing context from knowledge bases and using a new form of coherence graph that significantly outperforms prior methods in terms of accuracy, with robust behavior across a variety of inputs.

Multilingual Autoregressive Entity Linking

We present mGENRE, a sequence-to- sequence system for the Multilingual Entity Linking (MEL) problem—the task of resolving language-specific mentions to a multilingual Knowledge Base (KB). For a

MEANTIME, the NewsReader Multilingual Event and Time Corpus

A procedure was devised to automatically project the annotations on the English texts onto the translated texts, based on the manual alignment of the annotated elements, which enabled us to speed up the annotation process but also provided cross-lingual coreference.