Corpus ID: 18296816

Sense Clustering Using Wikipedia

@inproceedings{Dandala2013SenseCU,
  title={Sense Clustering Using Wikipedia},
  author={Bharath Dandala and Chris Hokamp and Rada Mihalcea and Razvan C. Bunescu},
  booktitle={RANLP},
  year={2013}
}
In this paper, we propose a novel method for generating a coarse-grained sense inventory from Wikipedia using a machine learning framework. Structural and content-based features are employed to induce clusters of articles representative of a word sense. Additionally, multilingual features are shown to improve the clustering accuracy, especially for languages that are less comprehensive than English. We show the effectiveness of our clustering methodology by testing it against both manually and… Expand
Automatic Construction and Evaluation of a Large Semantically Enriched Wikipedia
TLDR
This paper presents the automatic construction and evaluation of a Semantically Enriched Wikipedia (SEW) in which the overall number of linked mentions has been more than tripled solely by exploiting the structure of Wikipedia itself and the wide-coverage sense inventory of BabelNet. Expand
Embedding Words and Senses Together via Joint Knowledge-Enhanced Training
TLDR
This work proposes a new model which learns word and sense embeddings jointly and exploits large corpora and knowledge from semantic networks in order to produce a unified vector space of word and senses. Expand
NASARI: a Novel Approach to a Semantically-Aware Representation of Items
TLDR
A vector representation technique that combines the complementary knowledge of both lexicographic and encyclopedic resources, such as Wikipedia, and attains state-of-the-art performance on multiple datasets in two standard benchmarks: word similarity and sense clustering. Expand
Exploiting multiple languages and resources jointly for high-quality Word Sense Disambiguation and Entity Linking
Definitional knowledge has proved to be essential in various Natural Language Processing tasks and applications, especially when information at the level of word senses is exploited. However, the fewExpand
From senses to texts: An all-in-one graph-based approach for measuring semantic similarity
TLDR
The method first leverages the structural properties of a semantic network in order to model arbitrary linguistic items through a unified probabilistic representation, and then compares the linguistic items in terms of their representations. Expand
An Improved Crowdsourcing Based Evaluation Technique for Word Embedding Methods
TLDR
A crowdsourcing-based word embedding evaluation technique that will be more reliable and linguistically justified and captures word relatedness based on the word context. Expand
SenseDefs: a multilingual corpus of semantically annotated textual definitions
TLDR
SenseDefs is presented, a large-scale high-quality corpus of disambiguated definitions (or glosses) in multiple languages, comprising sense annotations of both concepts and named entities from a wide-coverage unified sense inventory. Expand
A Large-Scale Multilingual Disambiguation of Glosses
TLDR
A large-scale high-quality corpus of disambiguated glosses in multiple languages, comprising sense annotations of both concepts and named entities from a unified sense inventory is presented. Expand
Towards a Seamless Integration of Word Senses into Downstream NLP Applications
TLDR
It is shown that a simple disambiguation of the input text can lead to consistent performance improvement on multiple topic categorization and polarity detection datasets, particularly when the fine granularity of the underlying sense inventory is reduced and the document is sufficiently large. Expand
Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities
TLDR
A novel multilingual vector representation, called Nasari, is put forward, which not only enables accurate representation of word senses in different languages, but it also provides two main advantages over existing approaches: high coverage and comparability across languages and linguistic levels. Expand
...
1
2
...

References

SHOWING 1-10 OF 23 REFERENCES
Using Wikipedia for Automatic Word Sense Disambiguation
TLDR
A method for generating sense-tagged data using Wikipedia as a source of sense annotations and showing that the Wikipedia-based sense annotations are reliable and can be used to construct accurate sense classifiers is described. Expand
Automatic sense clustering in eurowordnet
TLDR
This paper addresses ways in which to reduce the fine-grainedness of WordNet and express in a more systematic way the relations between its numerous sense distinctions and the compatibility of the language-specific wordnets in the EuroWordNet multilingual knowledge base. Expand
Meaningful Clustering of Senses Helps Boost Word Sense Disambiguation Performance
TLDR
This paper presents a method for reducing the granularity of the WordNet sense inventory based on the mapping to a manually crafted dictionary encoding sense hierarchies, namely the Oxford Dictionary of English. Expand
Learning to Merge Word Senses
TLDR
A discriminative classifier is trained over a wide variety of features derived from WordNet structure, corpus-based evidence, and evidence from other lexical resources, and a learned similarity measure outperforms previously proposed automatic methods for sense clustering on the task of predicting human sense merging judgments. Expand
Graded Word Sense Assignment
TLDR
Grading word sense assignment is studied based on a recent dataset of graded word sense annotation and finds the task of labeling a word in context with the best-fitting sense from a sense inventory such as WordNet is difficult. Expand
Word Sense Disambiguation Improves Information Retrieval
TLDR
This paper proposes a method to estimate sense distributions for short queries and proposes a novel approach to incorporate word senses into the language modeling approach to IR and also exploit the integration of synonym relations. Expand
Wikify!: linking documents to encyclopedic knowledge
This paper introduces the use of Wikipedia as a resource for automatic keyword extraction and word sense disambiguation, and shows how this online encyclopedia can be used to achieve state-of-the-artExpand
Learning to link with wikipedia
TLDR
This paper explains how machine learning can be used to identify significant terms within unstructured text, and enrich it with links to the appropriate Wikipedia articles, and performs very well, with recall and precision of almost 75%. Expand
Using Encyclopedic Knowledge for Named entity Disambiguation
TLDR
A disambiguation SVM kernel is trained to exploit the high coverage and rich structure of the knowledge encoded in an online encyclopedia and significantly outperforms a less informed baseline. Expand
Towards Building a Multilingual Semantic Network: Identifying Interlingual Links in Wikipedia
TLDR
This paper creates their own gold standard by sampling translational links from four language pairs using distance heuristics and manually annotate the sampled translation links to evaluate the output of the method for automatic link detection and correction. Expand
...
1
2
3
...