Learn More
Reachability queries on large directed graphs have attracted much attention recently. The existing work either uses spanning structures, such as chains or trees, to compress the complete transitive closure, or utilizes the 2-hop strategy to describe the reachability. Almost all of these approaches work well for very sparse graphs. However, the challenging(More)
In this paper we discuss a very simple approach of combining content and link information in graph structures for the purpose of community discovery, a fundamental task in network analysis. Our approach hinges on the basic intuition that many networks contain noise in the link structure and that content information can help strengthen the community signal.(More)
Transactional data are ubiquitous. Several methods, including frequent itemsets mining and co-clustering, have been proposed to analyze transactional databases. In this work, we propose a new research problem to succinctly summarize transactional databases. Solving this problem requires linking the high level structure of the database to a potentially huge(More)
Transactional data are ubiquitous. Several methods, including frequent itemset mining and co-clustering, have been proposed to analyze transactional databases. In this work, we propose a new research problem to succinctly summarize transactional databases. Solving this problem requires linking the high level structure of the database to a potentially huge(More)
In this work, we study a visual data mining problem: Given a set of discovered overlapping submatrices of interest , how can we order the rows and columns of the data matrix to best display these submatrices and their relation-ships? We find this problem can be converted to the hyper-graph ordering problem, which generalizes the traditional minimal linear(More)
Estimating the number of frequent itemsets for minimal support α in a large dataset is of great interest from both theoretical and practical perspectives. However, finding not only the number of frequent itemsets, but even the number of maximal frequent itemsets, is #P-complete. In this study, we provide a theoretical investigation on the sampling(More)
The ability to summarize a large number of network patterns discovered from biomedical data provides valuable information for use in many applications. We show that several variants of the problem are all NP-hard, and merging network patterns is a practical solution for these applications. In this work, we propose an algorithmic framework for merging(More)
We study online social group dynamics based on how group members diverge in their online discussions. Previous studies mostly focused on the link structure to characterize social group dynamics, whereas the group behavior of content generation in discussions is not well understood. Particularly , we use Jensen-Shannon (JS) divergence to measure the(More)