Nguyen Kim Anh

Learn More
Document clustering has become an increasingly important technique for unsupervised document organization, automatic topic extraction, and fast information retrieval or filtering. This paper proposes a Dirichlet process mixture (DPM) model approach to clustering directional data based on the von Mises-Fisher (vMF) distribution, which arises naturally for(More)
Document classifications is essential to information retrieval and text mining. In real life, unlabeled data is readily available whereas labeled ones are often laborious, expensive and slow to obtain. This paper proposes a novel Document Classification approach based on semi-supervised vMF mixture model on document manifold, called Laplacian regularized(More)
Document clustering has become an increasingly important technique for unsupervised document organization, automatic topic extraction, and fast information retrieval or filtering. The generative model for document clustering based on the von Mises-Fisher (vMF) distribution generally produces better clustering results than other generative models. However,(More)
Classification of multilabel documents is essential to information retrieval and text mining. Most of existing approaches to multilabel text classification do not pay attention to relationship between class labels and input documents and also rely on labeled data all the time for classification. In fact, unlabeled data is readily available whereas(More)
In order to improve the usability of a search engines, Query Suggestion, a technique for generating alternative queries to Web users, has become an indispensable feature for such systems. All major web-search engines and most existing works on query suggestion utilize query logs to determine possible query suggestions. However, for many search systems,(More)
In this paper, we introduce a partially automated method to generate qualified answers at multiple abstraction levels for database queries. We examine the issues involving data summarization by Attribute-Oriented Induction (AOI) on large databases using fuzzy concept hierarchies. Because a node may have many abstracts, the fuzzy hierarchies become more(More)