Cross-lingual Name Tagging and Linking for 282 Languages

  title={Cross-lingual Name Tagging and Linking for 282 Languages},
  author={Xiaoman Pan and Boliang Zhang and Jonathan May and Joel Nothman and Kevin Knight and Heng Ji},
  booktitle={Annual Meeting of the Association for Computational Linguistics},
The ambitious goal of this work is to develop a cross-lingual name tagging and linking framework for 282 languages that exist in Wikipedia. [] Key Method We achieve this goal by performing a series of new KB mining methods: generating “silver-standard” annotations by transferring annotations from English to other languages through crosslingual links and KB properties, refining annotations through self-training and topic selection, deriving language-specific morphology features from anchor links, and mining…

Figures and Tables from this paper

Design Challenges in Low-resource Cross-lingual Entity Linking

It is claimed that, under the low-resource language setting, outside-Wikipedia cross-lingual resources are essential and a simple but effective zero-shot framework is proposed, CogCompXEL, that complements current methods by utilizing query log mapping files from online search engines.

Towards Zero-resource Cross-lingual Entity Linking

This work examines the effect of resource assumptions and quantifies how much the availability of these resource affects overall quality of existing XEL systems and proposes three improvements to both entity candidate generation and disambiguation that make better use of the limited resources the authors do have in resource-scarce scenarios.

Multi-lingual Entity Discovery and Linking

This tutorial discusses and compares multiple methods that make use of multi-lingual word embeddings and presents EL methods that work for both name tagging and linking in very low resource languages.

Neural Cross-Lingual Entity Linking

This paper proposes a neural EL model that trains fine-grained similarities and dissimilarities between the query and candidate document from multiple perspectives, combined with convolution and tensor networks and shows that this English-trained system can be applied, in zero-shot learning, to other languages by making surprisingly effective use of multi-lingual embeddings.

XLEnt: Mining a Large Cross-lingual Entity Dataset with Lexical-Semantic-Phonetic Word Alignment

This work proposes Lexical-Semantic-Phonetic Align (LSP-Align), a technique to automatically mine cross-lingual entity lexica from mined web data and releases the massively multilingual tagged named entity corpus as a resource to the NLP community.

Improving Candidate Generation for Low-resource Cross-lingual Entity Linking

This paper assesses the problems faced by current entity candidate generation methods for low- resource XEL, then proposes three improvements that reduce the disconnect between entity mentions and KB entries, and improve the robustness of the model to low-resource scenarios.

Pivot-based Candidate Retrieval for Cross-lingual Entity Linking

The proposed pivot-based approach takes an intermediary set of plausible target-language mentions as pivots to bridge the two types of gaps: cross-lingual gap and mention-entity gap and outperforms both the lexicon-based and semantic-based approaches.

Linking Named Entities across Languages using Multilingual Word Embeddings

This paper describes an XEL system applied and evaluated with several languages pairs including English and various low-resourced languages of different linguistic families such as Croatian, Finnish, Estonian, and Slovenian and tested this approach to analyze documents and NEs in low- Resourced languages and link them to the English version of Wikipedia.

Entity Linking in 100 Languages

A new formulation for multilingual entity linking, where language-specific mentions resolve to a language-agnostic Knowledge Base is proposed, where the model outperforms state-of-the-art results from a far more limited cross-lingual linking task.

Zero-Shot Cross-lingual Name Retrieval for Low-Resource Languages

A novel method is presented which is able to perform name retrieval on a new language without any additional training data by leveraging freely available, cross-lingual resources and a small amount of training data from another language.



Cross-Lingual Named Entity Recognition via Wikification

A language independent method for NER is introduced, building on cross-lingual wikification, a technique that grounds words and phrases in nonEnglish text into English Wikipedia entries, yielding strong language-independent features.

Building a Cross-Language Entity Linking Collection in Twenty-One Languages

An efficient way to create a test collection for evaluating the accuracy of cross-language entity linking is described, which includes approximately 55,000 queries, comprising between 875 and 4,329 queries for each of twenty-one non-English languages.

Cross-Language Entity Linking

A new test collection is created to evaluate cross-language entity linking performance in twenty-one languages and presents experiments that examine issues such as: the importance of transliteration; the utility of cross- language information retrieval; and, the potential benefit of multilingual named entity recognition.

Learning multilingual named entity recognition from Wikipedia

Mining Wiki Resources for Multilingual Named Entity Recognition

A system by which the multilingual characteristics of Wikipedia can be utilized to annotate a large corpus of text with Named Entity Recognition (NER) tags requiring minimal human intervention and no linguistic expertise is described.

Joint bilingual name tagging for parallel corpora

Two novel approaches to jointly and consistently extract names from parallel corpora using standard linear-chain Conditional Random Fields as the learning framework, incorporating cross-lingual features propagated between two languages are proposed.

Automatic Creation of Arabic Named Entity Annotated Corpus Using Wikipedia

A new methodology to exploit Wikipedia features and structure to automatically develop an Arabic NE annotated corpus is proposed and a filtering algorithm is developed to eliminate ambiguity when tagging candidate NEs.

Name Tagging for Low-resource Incident Languages based on Expectation-driven Learning

This paper tackles a challenging name tagging problem in an emergent setting the tagger needs to be complete within a few hours for a new incident language (IL) using very few resources and proposes a new expectation-driven learning framework that rapidly acquire, categorize, structure and zoom in on ILspecific expectations.

Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition

A graphical model is presented that performs bilingual NER tagging jointly with word alignment, by combining two monolingual tagging models with two unidirectional alignment models, and a dual decomposition inference algorithm is designed to perform joint decoding over the combined alignment and NER output space.

One for All: Towards Language Independent Named Entity Linking

LIEL, a Language Independent Entity Linking system, is introduced, which provides an EL framework which, once trained on one language, works remarkably well on a number of different languages without change.