This paper presents two unsupervised frameworks for solving this problem: one based on link structure of the Web pages, another using Agglomerative/CongLomerative Double Clustering (A/CDC)---an application of a recently introduced multi-way distributional clustering method.
An approach to text categorization that combines distributional clustering of words and a Support Vector Machine (SVM) classifier with a word-cluster representation is studied, which significantly outperforms the word-based representation in terms of categorization accuracy or representation efficiency.
An extensive benchmark study of email foldering using two large corpora of real-world email messages and foldering schemes: one from former Enron employees, another from participants in an SRI research project.
This tutorial gives a broad view of modern approaches for scaling up machine learning and data mining methods on parallel/distributed platforms and provides an integrated overview of state-of-the-art platforms and algorithm choices.
An end-to-end system that extracts a user's social network and its members' contact information given the user's email inbox and discusses the capabilities of the system for address book population, expert-finding, and social network analysis.
An attempt to incorporate bigrams in a document representation based on distributional clusters of unigrams, and the reported result is (to the authors' knowledge) the best categorization result ever achieved on this highly popular dataset.
This work describes a text categorization approach that is based on a combination of feature distributional clusters with a support vector machine (SVM) classifier that yields high performance text classification that can outperform other recent methods in terms of categorization accuracy and representation efficiency.
A light-weight version of the recently introduced combinatorial Markov random field (Comraf), Comraf* (pronounced Comraf-Star), efficiently incorporates various views in multi-modal clustering, by which it allows great modeling flexibility.