An Empirical Selection Method for Document Clustering

  title={An Empirical Selection Method for Document Clustering},
  author={P.Perumal and R. Nedunchezhian and D.Brindha},
  journal={International Journal of Computer Applications},
Model Selection is a task selecting set of potential models. This method is capable of establishing hidden semantic relations among the observed features, using a number of latent variables. In this paper, the selection of the correct number of latent variables is critical. In the most of the previous researches, the number of latent topics was selected based on the number of invoked classes. This paper presents a method, based on backward elimination approach, which is capable of unsupervised… Expand
1 Citations
Indoor mobile robot localization using KNN
This paper describes the usage of sixteen piece 40 kHz ultrasonic sensors, known as Ultrasonic Sensor Bank (USB-16) mounted on a mobile robot platform. The Homogeneous Transformation Matrix (HTM) andExpand


Using backward elimination with a new model order reduction algorithm to select best double mixture model for document clustering
A method, based on backward elimination approach, which is capable of unsupervised order selection in PLSA, and leads to an optimized number of latent variables and in turn achieves better clustering performance compared to the conventional model selection methods. Expand
Document clustering via dirichlet process mixture model with feature selection
This paper proposes a novel approach, namely DPMFS, to group documents into a set of clusters while the number of document clusters is determined by the Dirichlet process mixture model automatically; and to identify the discriminative words and separate them from irrelevant noise words via stochastic search variable selection technique. Expand
A Comparison of Document Clustering Techniques
This paper compares the two main approaches to document clustering, agglomerative hierarchical clustering and K-means, and indicates that the bisecting K-MEans technique is better than the standard K-Means approach and as good or better as the hierarchical approaches that were tested for a variety of cluster evaluation metrics. Expand
Indexing by Latent Semantic Analysis
A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. Expand
Variable selection via Gibbs sampling
Abstract A crucial problem in building a multiple regression model is the selection of predictors to include. The main thrust of this article is to propose and develop a procedure that usesExpand
Variable selection in clustering via Dirichlet process mixture models
This paper introduces a latent binary vector to identify discriminating variables and use Dirichlet process mixture models to define the cluster structure and updates the variable selection index using a Metropolis algorithm and obtains inference on the clusters structure via a split-merge Markov chain Monte Carlo technique. Expand
Probabilistic Latent Semantic Analysis
This work proposes a widely applicable generalization of maximum likelihood model fitting by tempered EM, based on a mixture decomposition derived from a latent class model which results in a more principled approach which has a solid foundation in statistics. Expand
Unsupervised Learning of Finite Mixture Models
The novelty of the approach is that it does not use a model selection criterion to choose one among a set of preestimated candidate models; instead, it seamlessly integrate estimation and model selection in a single algorithm. Expand
Hierarchical Dirichlet Processes
We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume thatExpand
Dirichlet Process Mixture Models for Verb Clustering
A method to add human supervision to the Dirichlet Process Mixture Models in order to to influence the solution with respect to some prior knowledge to highlight the benefits of the chosen method compared to previously used clustering approaches. Expand