MENTA: inducing multilingual taxonomies from wikipedia

@article{deMelo2010MENTAIM,
  title={MENTA: inducing multilingual taxonomies from wikipedia},
  author={Gerard de Melo and Gerhard Weikum},
  journal={Proceedings of the 19th ACM international conference on Information and knowledge management},
  year={2010}
}
  • Gerard de Melo, G. Weikum
  • Published 26 October 2010
  • Biology
  • Proceedings of the 19th ACM international conference on Information and knowledge management
In recent years, a number of projects have turned to Wikipedia to establish large-scale taxonomies that describe orders of magnitude more entities than traditional manually built knowledge bases. [] Key Method We rely on linking heuristics to discover potential taxonomic relationships, graph partitioning to form consistent equivalence classes of entities, and a Markov chain-based ranking approach to construct the final taxonomy. This results in MENTA (Multilingual Entity Taxonomy), a resource that describes…

Figures and Tables from this paper

280 Birds with One Stone: Inducing Multilingual Taxonomies from Wikipedia using Character-level Classification
TLDR
This work proposes a simple, yet effective, approach towards inducing multilingual taxonomies from Wikipedia that leverages the interlanguage links of Wikipedia followed by character-level classifiers to induce high-precision, high-coverageTaxonomies in other languages.
Automatic acquisition of taxonomies in different languages from multiple Wikipedia versions
TLDR
The adaptation of existing heuristics that make the extraction of large sets of hyponymy relations from multiple Wikipedia versions with little information about each language possible are described.
Graph-based methods for large-scale multilingual knowledge integration
TLDR
Novel methods for automatically building large repositories of knowledge that capture semantic relationships between words, names, and entities, in many different languages are investigated, each involving graph algorithms and statistical techniques that combine evidence from multiple sources of information.
UWN: A Large Multilingual Lexical Knowledge Base
TLDR
This paper explains how link prediction, information integration and taxonomy induction methods have been used to build UWN based on WordNet and extend it with millions of named entities from Wikipedia.
Discovering Cross-language Links in Wikipedia through Semantic Relatedness
TLDR
WIKICL is proposed, an algorithm for discoverinrg cross-language links using the semantic relatedness of two articles derived from the Wikipedia graph structure that achieves comparable, and in some cases, better results than previous methods with much less computational time.
Unsupervised learning of an extensive and usable taxonomy for DBpedia
TLDR
An unsupervised approach is presented that automatically learns a taxonomy from the Wikipedia category system and extensively assigns types to DBpedia entities, through the combination of several interdisciplinary techniques, which provides a robust backbone for DBpedia knowledge and has the benefit of being easy to understand for end users.
Towards Building a Multilingual Semantic Network: Identifying Interlingual Links in Wikipedia
TLDR
This paper creates their own gold standard by sampling translational links from four language pairs using distance heuristics and manually annotate the sampled translation links to evaluate the output of the method for automatic link detection and correction.
Automatic Taxonomy Extraction in Different Languages Using Wikipedia and Minimal Language-Specific Information
TLDR
This work describes a method for extracting a large set of hyponymy relations from the Wikipedia category system that can be used to acquire taxonomies in multiple languages and describes a set of 20 features that could be used for Hyponymy Detection without using additional language-specific corpora.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 46 REFERENCES
Large-Scale Taxonomy Mapping for Restructuring and Integrating Wikipedia
We present a knowledge-rich methodology for disambiguating Wikipedia categories with WordNet synsets and using this semantic information to restructure a taxonomy automatically generated from the
Untangling the Cross-Lingual Link Structure of Wikipedia
TLDR
An algorithm with provable properties that uses linear programming and a region growing technique to tackle the problem of large numbers of imprecise or simply wrong Wikipedia links is presented.
Semantic Taxonomy Induction from Heterogenous Evidence
TLDR
This work proposes a novel algorithm for inducing semantic taxonomies that flexibly incorporates evidence from multiple classifiers over heterogenous relationships to optimize the entire structure of the taxonomy, using knowledge of a word's coordinate terms to help in determining its hypernyms, and vice versa.
Minimally Supervised Multilingual Taxonomy and Translation Lexicon Induction
We present a novel algorithm for the acquisition of multilingual lexical taxonomies (including hyponymy/hypernymy, meronymy and taxonomic cousinhood), from monolingual corpora with minimal
Named Entity WordNet
TLDR
This paper presents the automatic extension of Princeton WordNet with Named Entities (NEs) and explores different aspects of the methodology such as the treatment of polysemous terms, the identification of hyponyms within the Wikipedia categorization system, the Identification of Wikipedia articles which are NEs and the design of a NE repository compliant with the LMF ISO standard.
Taxonomy Learning Using Word Sense Induction
TLDR
An unsupervised method is presented that builds a taxonomy of senses learned automatically from an unlabelled corpus that captures a higher number of correct taxonomic relations compared to those produced by traditional distributional similarity approaches that merge senses by grouping the features of each word into a single vector.
Building a Multilingual Lexical Resource for Named Entity Disambiguation, Translation and Transliteration
TLDR
HeiNER contains 1,547,586 disambiguated English Named Entities together with translations and transliterations to 15 languages and provides linguistic contexts for every NE in all target languages which makes it a valuable resource for multilingual Named Entity Recognition, Disambiguation and Classification.
Enriching the crosslingual link structure of Wikipedia - A classification-based approach
TLDR
This paper presents a classification-based approach with the goal of inferring new cross-language links in Wikipedia and shows that this approach has a recall of 70% with a precision of 94% for the task of learning cross- language links on a test dataset.
Using the Multilingual Central Repository for Graph-Based Word Sense Disambiguation
TLDR
A graph-based method for performing knowledge-based Word Sense Disambiguation (WSD) is applied to the Multilingual Central Repository and the contents of the MCR are indirectly validated and obtained state-of-the-art results.
WikiNet: A Very Large Scale Multi-Lingual Concept Network
TLDR
A multi-lingual concept network obtained automatically by mining for concepts and relations and exploiting a variety of sources of knowledge from Wikipedia is described.
...
1
2
3
4
5
...