Improving efficiency and accuracy in multilingual entity extraction

  title={Improving efficiency and accuracy in multilingual entity extraction},
  author={Joachim Daiber and Max Jakob and Chris Hokamp and Pablo N. Mendes},
  booktitle={I-SEMANTICS '13},
There has recently been an increased interest in named entity recognition and disambiguation systems at major conferences such as WWW, SIGIR, ACL, KDD, etc. [] Key Method We compare our solution to the previous system, considering time performance, space requirements and accuracy in the context of the Dutch and English languages. Additionally, we report results for 9 additional languages among the largest Wikipedias. Finally, we present challenges and experiences to foment the discussion with other…

Tables from this paper

Scalable Disambiguation System Capturing Individualities of Mentions
This paper proposes a new system that learns specialized features and models for disambiguating each ambiguous phrase in the English language using a Wikipedia hyperlink dataset with more than 170 million labelled annotations.
2 State of the Art in Entity Detection ( Spotting )
TOMO, an approach to language-aware named entity detection and evaluates it for the German language shows that language-dependent features do improve the overall quality of the spotter.
Toponym disambiguation in historical documents using semantic and geographic features
This paper proposes a weakly-supervised method that combines the strengths of both approaches by exploiting both geographic and semantic features in toponym disambiguation and tested it against a historical toponym resolution benchmark and improved the state of the art.
Improving Language-Dependent Named Entity Detection
TOMO, an approach to language-aware named entity detection and evaluates it for the German language and results show that language-dependent features do improve the overall quality of the spotter.
It is shown how the pooling technique is adapted to address the difficulties of gathering annotations for the entity linking task, and how the task definition, issues encountered during annotation, and detailed analysis of all the participating systems are provided.
UBC entity recognition and disambiguation at ERD 2014
This paper describes the system developed at the University of the Basque Country (UBC) for the Entity Recognition and Disambiguation Challenge (ERD 2014), which implemented a very basic mention detection component and complement it with a strong disambiguated step, based on Personalized PageRank algorithm.
Entity linking by focusing DBpedia candidate entities
This paper proposes to use text pre-processing and parameter tuning to "focus" a general-purpose EL system to perform better on different kinds of input text and design a classifier to automatically classify DBpedia Spotlight's output entities as "NIL" or "Not NIL".
TANKER: Distributed Architecture for Named EntityRecognition and Disambiguation
TANKER, a distributed architecture which aims to overcome scalability, reliability and failure tolerance limitations related to industrial needs by combining NERD systems, relies on a micro-services oriented architecture, which enables agile development and delivery of complex enterprise applications.
An Unsupervised Language-Independent Entity Disambiguation Method and its Evaluation on the English and Persian Languages
Evaluation of ULIED on different English entity linking datasets as well as the only available Persian dataset illustrates that ULIED in most of the cases outperforms the state-of-the-art unsupervised multi-lingual approaches.
FICLONE: Improving DBpedia Spotlight Using Named Entity Recognition and Collective Disambiguation
FICLONE not only substantially improves the performance of DBpedia Spotlight for the NED sub-task but also generally outperforms other state-of-the-art systems.


Evaluating the Impact of Phrase Recognition on Concept Tagging
The impact of the phrase recognition step on the ability of the DBpedia Spotlight system to correctly reproduce the annotations of a gold standard in an unsupervised setting is evaluated.
Learning to link with wikipedia
This paper explains how machine learning can be used to identify significant terms within unstructured text, and enrich it with links to the appropriate Wikipedia articles, and performs very well, with recall and precision of almost 75%.
DBpedia spotlight: shedding light on the web of documents
DBpedia Spotlight, a system for automatically annotating text documents with DBpedia URIs, is developed, and results are evaluated in light of three baselines and six publicly available annotation systems, demonstrating the competitiveness of the system.
Large Scale Syntactic Annotation of Written Dutch: Lassy
This chapter presents the Lassy Small and Lassy Large treebanks, as well as related tools and applications, which have been developed and made available for syntactically annotated corpora.
A Generative Entity-Mention Model for Linking Entities with Knowledge Base
This paper proposes a generative probabilistic model, called entity-mention model, which can leverage heterogenous entity knowledge (including popularity knowledge, name knowledge and context knowledge) for the entity linking task.