Corpus ID: 18106882

MLK-Means-A Hybrid Machine Learning based K-Means Clustering Algorithms for Document Clustering

  title={MLK-Means-A Hybrid Machine Learning based K-Means Clustering Algorithms for Document Clustering},
  author={P. Perumal and R. Nedunchezhian},
Document clustering is useful in many information retrieval tasks such as document browsing, organization and viewing of retrieval results. They are very much and currently the subject of significant global research. Generative models based on the multivariate Bernoulli and multinomial distributions have been widely used for text classification. In this work, address a new hybrid algorithm called MLK-Means for clustering TMG format document data, in which, the normal Euclidean distance based… Expand

Figures and Tables from this paper

AntMeans: A New Hybrid Algorithm based on Ant Colonies for Complex Data Mining
This paper proposes a hybrid solution “AntMeans” based on tools from data mining: kmeans and AntClust (algorithm based on artificial ants) then this solution is applied firstly in spatial data mining, and secondly in image data mining. Expand


Performance Analysis of Standard k-Means Clustering Algorithm on Clustering TMG format Document Data
Document clustering is useful in many information retrieval operations such as document browsing, organization and viewing of retrieval results, generation of Yahoo-like hierarchies of documents,Expand
Improving the Performance of Multivariate Bernoulli Model based Documents Clustering Algorithms using Transformation Techniques
This work proposes a FFT based transformation technique for improving clustering performance of multivariate Bernoulli model based probabilistic algorithm for text clustering application. Expand
A Comparison of Document Clustering Techniques
This paper compares the two main approaches to document clustering, agglomerative hierarchical clustering and K-means, and indicates that the bisecting K-MEans technique is better than the standard K-Means approach and as good or better as the hierarchical approaches that were tested for a variety of cluster evaluation metrics. Expand
A Comparative Study of Generative Models for Document Clustering
Overall, CLUTO and DA perform the best but are also the most computationally expensive; the spectral coclustering algorithm fares worse than the vMF-based methods. Expand
An Efficient Density based Improved K- Medoids Clustering algorithm
This paper proposes an efficient density based k-medoids clustering algorithm that will perform better than DBSCAN while handling clusters of circularly distributed data points and slightly overlapped clusters. Expand
A Novel Density based improved k-means Clustering Algorithm – Dbkmeans
Mining knowledge from large amounts of spatial data is known as spatial data mining. It becomes a highly demanding field because huge amounts of spatial data have been collected in variousExpand
A Study of Clustering and Classification Algorithms Used in Datamining
In this paper various clustering and classification algorithms are going to be addressed in detail and a detailed survey on existing algorithms is made and the scalability of some of the existing classification algorithms will be examined. Expand
Document Clustering and Text Summarization
A text mining tool that performs two tasks, namely document clustering and text summarization, based on computing the value of a TF-ISF measure for each word, which is an adaptation of the conventional TF-IDF measure of information retrieval. Expand
Efficiently Clustering Documents with Committees
A new evaluation methodology based on the editing distance between output clusters and manually constructed classes (the answer key) is presented, which is more intuitive and easier to interpret than previous evaluation measures. Expand
Evaluation of four clustering methods used in text mining
Classification systems are used more and more often in artificial intelligence, especially to analyze texts and to extract knowledge they contain. The results of general clustering methods, though,Expand