Ngo Van Linh

  • Citations Per Year
Learn More
Document clustering has become an increasingly important technique for unsupervised document organization, automatic topic extraction, and fast information retrieval or filtering. This paper proposes a Dirichlet process mixture (DPM) model approach to clustering directional data based on the von Mises-Fisher (vMF) distribution, which arises naturally for(More)
As the number of documents has been rapidly increasing in recent time, automatic text categorization is becoming a more important and fundamental task in information retrieval and text mining. Accuracy and interpretability are two important aspects of a text classifier. While the accuracy of a classifier measures the ability to correctly classify unseen(More)
Classification on networked data plays an important role in many problems such as web page categorization, classification of bibliographic information network, etc... Most classification algorithms on information networks work by iteratively propagating information through network graphs. One important issue concerning iterative classifiers is that false(More)
Document classifications is essential to information retrieval and text mining. In real life, unlabeled data is readily available whereas labeled ones are often laborious, expensive and slow to obtain. This paper proposes a novel Document Classification approach based on semi-supervised vMF mixture model on document manifold, called Laplacian regularized(More)
Classification of multilabel documents is essential to information retrieval and text mining. Most of existing approaches to multilabel text classification do not pay attention to relationship between class labels and input documents and also rely on labeled data all the time for classification. In fact, unlabeled data is readily available whereas(More)
Document clustering has become an increasingly important technique for unsupervised document organization, automatic topic extraction, and fast information retrieval or filtering. The generative model for document clustering based on the von Mises-Fisher (vMF) distribution generally produces better clustering results than other generative models. However,(More)
Analyzing texts from social media often encounters many challenges, including shortness, dynamic, and huge size. Short texts do not provide enough information so that statistical models often fail to work. In this paper, we present a very simple approach (namely, bagof-biterms) that helps statistical models such as Hierarchical Dirichlet Processes (HDP) to(More)