Corpus ID: 12022866

WordNet improves text document clustering

@inproceedings{Hotho2003WordNetIT,
  title={WordNet improves text document clustering},
  author={Andreas Hotho and Steffen Staab and Gerd Stumme},
  booktitle={SIGIR 2003},
  year={2003}
}
Text document clustering plays an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters. The bag of words representation used for these clustering methods is often unsatisfactory as it ignores relationships between important terms that do not co-occur literally. In order to deal with the problem, we integrate background knowledge — in our application Wordnet — into the process of clustering… Expand
Text Document Clustering based on Semantics
Text document clustering plays an important role in providing intuitive navigation and browsing mechanisms by organizing large sets of documents into a small number of meaningful clusters. ClusteringExpand
Text Document Clustering based on Semantics
TLDR
This model combines phrases analysis as well as words analysis with the use of WordNet as background Knowledge and NLP to explore better ways of document representation for clustering to improve the web document clustering. Expand
WordNet-based text document clustering
TLDR
In this research, naive, syntax-based disambiguation is attempted by assigning each word a part-of-speech tag and by enriching the 'bag- of-words' data representation often used for document clustering with synonyms and hypernyms from WordNet. Expand
Directory for Improving the Performance of Text Document Clustering by
In recent years, we have witnessed a tremendous growth in the volume of text documents available on the Internet, digital libraries, news sources, company-wide intranets and so on. This has led to anExpand
Wikipedia-Based Smoothing for Enhancing Text Clustering
TLDR
A language modeling approach for text clustering is adopted and the contents of Wikipedia articles as well as their assigned categories are used in three different ways to smooth the document language models with the goal of enriching the document contents. Expand
Enhancing text clustering by leveraging Wikipedia semantics
TLDR
A way to build a concept thesaurus based on the semantic relations (synonym, hypernym, and associative relation) extracted from Wikipedia is proposed and a unified framework to leverage these semantic relations in order to enhance traditional content similarity measure for text clustering is developed. Expand
A new unsupervised method for document clustering by using WordNet lexical and conceptual relations
TLDR
This unsupervised method uses ANNIE and WordNet lexical categories and Word net ontology in order to create a well structured document vector space whose low dimensionality allows common clustering algorithms to perform well. Expand
A Study of the Effect of Document Representations in Clustering-Based Cross-Document Coreference Resolution
  • Horacio Saggion
  • Computer Science
  • Multi-source, Multilingual Information Extraction and Summarization
  • 2013
TLDR
This work describes experiments aiming at identifying the contribution of semantic information and summarization in a cross-document coreference resolution system that uses a clustering-based algorithm to group documents referring to the same entity. Expand
Similarity Measures for Text Document Clustering
Clustering is a useful technique that organizes a large quantity of unordered text documents into a small number of meaningful and coherent clusters, thereby providing a basis for intuitive andExpand
An integrated probabilistic text clustering model with segment-based and word order evidence
  • Lin Dai
  • Computer Science
  • 2011 7th International Conference on Advanced Information Management and Service (ICIPM)
  • 2011
TLDR
An integrated probabilistic model to explicitly combine the evidence from individual segments within a document and the word order information is proposed and, based on this model, a text clustering framework is proposed. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 27 REFERENCES
Text Clustering Based on Background Knowledge
TLDR
Partitional clustering reduces first the size of the problem such that it becomes tractable for conceptual clustering, which then facilitates the understanding of the results. Expand
Explaining Text Clustering Results Using Semantic Structures
TLDR
A way of integrating a large thesaurus and the computation of lattices of resulting clusters into common text clustering in order to overcome the problems of semantically nearby terms and how resulting clusters are related to each other is discussed. Expand
Using WordNet to Complement Training Information in Text Categorization
TLDR
This work integrates WordNet information with two training approaches through the Vector Space Model and shows that the integration of WordNet clearly outperforms training approaches, and that an integrated technique can effectively address the classification of low frequency categories. Expand
Document clustering with committees
TLDR
A new evaluation methodology that is based on the editing distance between output clusters and manually constructed classes (the answer key) is presented, which is more intuitive and easier to interpret than previous evaluation measures. Expand
Indexing with WordNet synsets can improve text retrieval
TLDR
The classical, vector space model for text retrieval is shown to give better results if WordNet synsets are chosen as the indexing space, instead of word forms, if queries are not disambiguated. Expand
Building Hypertext Links By Computing Semantic Similarity
TLDR
A novel method for automatic hypertext generation that is based on a technique called lexical chaining, a method for discovering sequences of related words in a text, and attempts to take into account the effects of synonymy and polysemy. Expand
Indexing by Latent Semantic Analysis
TLDR
A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. Expand
Query expansion using lexical-semantic relations
TLDR
Examination of the utility of lexical query expansion in the large, diverse TREC collection shows this query expansion technique makes little difference in retrieval effectiveness if the original queries are relatively complete descriptions of the information being sought even when the concepts to be expanded are selected by hand. Expand
FUB at TREC-10 Web Track: A Probabilistic Framework for Topic Relevance Term Weighting
TLDR
This approach endeavours to determine the weight of a word within a document in a purely theoretic way as a combination of probability distributions, with the goal of reducing as much as possible the number of parameters which must be learned and tuned from relevance assessments on training test collections. Expand
Integrating Linguistic Resources in TC through WSD
TLDR
An approach to TC based on the integration of a training collection and a lexical database as knowledge sources is described and the utilization of WSD is presented as an aid for TC. Expand
...
1
2
3
...