Jiefeng Cheng

Learn More
Data uncertainty is inherent in applications such as sensor monitoring systems, location-based services, and biological databases. To manage this vast amount of imprecise information, probabilistic databases have been recently developed. In this paper, we study the discovery of <i>frequent patterns and association rules</i> from probabilistic data under the(More)
There are numerous applications that need to deal with a large graph and need to query reachability between nodes in the graph. A 2-hop cover can compactly represent the whole edge transitive closure of a graph in <i>O(|V| . |E|</i><sup>1/2</sup>) space, and be used to answer reachability query efficiently. However, it is challenging to compute a 2-hop(More)
Due to rapid growth of the Internet technology and new scientific/technological advances, the number of applications that model data as graphs increases, because graphs have high expressive power to model complicated structures. The dominance of graphs in real-world applications asks for new graph data management so that users can access graph data(More)
The need of processing graph reachability queries stems from many applications that manage complex data as graphs. The applications include transportation network, Internet traffic analyzing, Web navigation, semantic web, chemical informatics and bio-informatics systems, and computer vision. A graph reachability query, as one of the primary tasks, is to(More)
Recent studies show that disk-based graph computation on just a single PC can be as highly competitive as cluster-based computing systems on large-scale problems. Inspired by this remarkable progress, we develop VENUS, a disk-based graph computation system which is able to handle billion-scale problems efficiently on a commodity PC. VENUS adopts a novel(More)
Due to rapid growth of the Internet and new scientific/technological advances, there exist many new applications that model data as graphs, because graphs have sufficient expressiveness to model complicated structures. The dominance of graphs in real-world applications demands new graph processing techniques to access large data graphs effectively and(More)
Despite its popularity, SimRank is computationally costly, in both time and space. In particular, its recursive nature poses a great challenge in using modern distributed computing power, and also prevents querying similarities individually. Existing solutions suffer greatly from these practical issues. In this paper, we break such dependency for maximum(More)
In software maintenance and evolution, it is common that developers want to apply a change to a number of similar places. Due to the size and complexity of the code base, it is challenging for developers to locate all the places that need the change. A main challenge in locating the places that need the change is that, these places share certain common(More)
There exist many graph-based applications including bioinformatics, social science, link analysis, citation analysis, and collaborative work. All need to deal with a large data graph. Given a large data graph, in this paper, we study finding top-k answers for a graph pattern query (kGPM), and in particular, we focus on top-k cyclic graph queries where a(More)