• Publications
  • Influence
gSpan: graph-based substructure pattern mining
  • Xifeng Yan, Jiawei Han
  • Mathematics, Computer Science
    IEEE International Conference on Data Mining…
  • 9 December 2002
A novel algorithm called gSpan (graph-based substructure pattern mining), which discovers frequent substructures without candidate generation by building a new lexicographic order among graphs, and maps each graph to a unique minimum DFS code as its canonical label.
PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks
Under the meta path framework, a novel similarity measure called PathSim is defined that is able to find peer objects in the network (e.g., find authors in the similar field and with similar reputation), which turns out to be more meaningful in many scenarios compared with random-walk based similarity measures.
Frequent pattern mining: current status and future directions
It is believed that frequent pattern mining research has substantially broadened the scope of data analysis and will have deep impact on data mining methodologies and applications in the long run, however, there are still some challenging research issues that need to be solved before frequent patternmining can claim a cornerstone approach in data mining applications.
Graph indexing: a frequent structure-based approach
The gIndex approach not only provides and elegant solution to the graph indexing problem, but also demonstrates how database indexing and query processing can benefit form data mining, especially frequent pattern mining.
CloseGraph: mining closed frequent graph patterns
A closed graph pattern mining algorithm, CloseGraph, is developed by exploring several interesting pruning methods and shows that it not only dramatically reduces unnecessary subgraphs to be generated but also substantially increases the efficiency of mining, especially in the presence of large graph patterns.
Mining Frequent Patterns in Data Streams at Multiple Time Granularities
This paper proposes computing and maintaining all the frequent patterns and dynamically updating them with the incoming data streams and incrementally maintain tilted-time windows for each pattern at multiple time granularities.
SOBER: statistical model-based bug localization
The result demonstrated the power of the approach in bug localization: SOBER can help programmers locate 68 out of 130 bugs in the Siemens suite when programmers are expected to examine no more than 10% of the code, whereas the best previously reported is 52 out of130.
Statistical Debugging: A Hypothesis Testing-Based Approach
A new statistical method, called SOBER, is proposed, which automatically localizes software faults without any prior knowledge of the program semantics and models the predicate evaluation in both correct and incorrect executions.
Discriminative Frequent Pattern Analysis for Effective Classification
This paper develops a strategy to set minimum support in frequent pattern mining for generating useful patterns, and demonstrates that the frequent pattern-based classification framework can achieve good scalability and high accuracy in classifying large datasets.