Effective K-Means Document Clustering using Dictionary Defined Lexical Analyzer (DDLA)

Abstract

Due to tremendous increase in number of documents, clustering of such document is difficult one. Document Clustering is the process of grouping related documents from the large collection of database. The mining of such related documents from the database which are unlabelled is a challenging one. To overcome this process, clustering is used to filter the unlabelled documents from the large collection of database. In this paper, a new concept is introduced for the document clustering by using k-means Enhanced Approach algorithm [1] with the Dictionary Defined Lexical Analyzer (DDLA). Basically K-Mean algorithm clusters the numeric values efficiently. But with the inclusion of DDLA the characters, words and sentences can also be clustered. Based on the weights, documents are clustered [7] by using bisecting k-means algorithm [1, 2] and topic detection method. The discovery of meaningful labels for the document is based on semantic similarity [8]. The efficient clustering of unlabeled documents with enhanced K-Mean algorithm and DDLA is one of the techniques which make clustering in an easiest way.

Extracted Key Phrases

9 Figures and Tables

Cite this paper

@inproceedings{Raj2012EffectiveKD, title={Effective K-Means Document Clustering using Dictionary Defined Lexical Analyzer (DDLA)}, author={R.Ranga Raj}, year={2012} }