Learn More
Document clustering has become an increasingly important technique for unsupervised document organization, automatic topic extraction, and fast information retrieval or filtering. This paper proposes a Dirichlet process mixture (DPM) model approach to clustering directional data based on the von Mises-Fisher (vMF) distribution, which arises naturally for(More)
Classification on networked data plays an important role in many problems such as web page categorization, classification of bibliographic information network, etc... Most classification algorithms on information networks work by iteratively propagating information through network graphs. One important issue concerning iterative classifiers is that false(More)
Document classifications is essential to information retrieval and text mining. In real life, unlabeled data is readily available whereas labeled ones are often laborious, expensive and slow to obtain. This paper proposes a novel Document Classification approach based on semi-supervised vMF mixture model on document manifold, called Laplacian regularized(More)
Document clustering has become an increasingly important technique for unsupervised document organization, automatic topic extraction, and fast information retrieval or filtering. The generative model for document clustering based on the von Mises-Fisher (vMF) distribution generally produces better clustering results than other generative models. However,(More)
Classification of multilabel documents is essential to information retrieval and text mining. Most of existing approaches to multilabel text classification do not pay attention to relationship between class labels and input documents and also rely on labeled data all the time for classification. In fact, unlabeled data is readily available whereas(More)
As the number of documents has been rapidly increasing in recent time, automatic text categorization is becoming a more important and fundamental task in information retrieval and text mining. Accuracy and interpretability are two important aspects of a text classifier. While the accuracy of a classifier measures the ability to correctly classify unseen(More)
Automatic text summarization plays an important role in information retrieval and text mining. Furthermore, it provides an useful solution to the information overload problem. In this paper, we propose a simplicial NMF-based unsupervised generic document summarization method which can inherit some advantages of simplicial NMF such as easy interpretability,(More)