Domain Based Punjabi Text Document Clustering

Abstract

Text Clustering is a text mining technique which is used to group similar documents into single cluster by using some sort of similarity measure & separating the dissimilar documents. Popular clustering algorithms available for text clustering treats document as conglomeration of words. The syntactic or semantic relations between words are not given any consideration. Many different algorithms were propagated to study and find connection among different words in a sentence by using different concepts. In this paper, a hybrid algorithm for clustering of Punjabi text document that uses semantic relations among words in a sentence for extracting phrases has been developed. Phrases extracted create a feature vector of the document which is used for finding similarity among all documents. Experimental results reveal that hybrid algorithm performs better with real time data sets.

Extracted Key Phrases

2 Figures and Tables

Cite this paper

@inproceedings{Sharma2012DomainBP, title={Domain Based Punjabi Text Document Clustering}, author={Saurabh Sharma and Vishal Gupta}, booktitle={COLING}, year={2012} }