• Publications
  • Influence
gSpan: graph-based substructure pattern mining
  • X. Yan, Jiawei Han
  • Mathematics, Computer Science
  • IEEE International Conference on Data Mining…
  • 9 December 2002
TLDR
We investigate new approaches for frequent graph-based pattern mining in graph datasets and propose a novel algorithm called gSpan (graph-based substructure pattern mining), which discovers frequent substructures without candidate generation. Expand
  • 2,197
  • 238
  • PDF
PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks
TLDR
We introduce a meta path-based similarity framework for objects that are defined among the same type of objects in heterogeneous networks. Expand
  • 988
  • 173
  • PDF
CloSpan: Mining Closed Sequential Patterns in Large Datasets
  • 982
  • 110
  • PDF
Frequent pattern mining: current status and future directions
TLDR
Frequent pattern mining has been a focused theme in data mining research for over a decade. Expand
  • 1,333
  • 80
  • PDF
Graph indexing: a frequent structure-based approach
TLDR
We investigate the issues of indexing graphs and propose a novel solution by applying a graph mining technique. Expand
  • 635
  • 56
  • PDF
Mining Frequent Patterns in Data Streams at Multiple Time Granularities
TLDR
In this paper, we propose computing and maintaining all the frequent patterns (which is usually more stable and smaller than the streaming data) and dynamically updating them with the incoming data streams. Expand
  • 575
  • 53
  • PDF
CloseGraph: mining closed frequent graph patterns
TLDR
A closed graph pattern mining algorithm, CloseGraph, is developed by exploring several interesting pruning methods to mine closed frequent graph patterns. Expand
  • 700
  • 52
  • PDF
SOBER: statistical model-based bug localization
TLDR
We propose a new statistical model-based approach, called SOBER, which localizes software bugs without any prior knowledge of program semantics, which can help programmers locate 68 out of 130 bugs in the Siemens suite when programmers are expected to examine no more than 10% of the code. Expand
  • 406
  • 45
  • PDF
Statistical Debugging: A Hypothesis Testing-Based Approach
TLDR
We propose a new statistical method, called SOBER, which automatically localizes software faults without any prior knowledge of the program semantics. Expand
  • 269
  • 36
  • PDF
Discriminative Frequent Pattern Analysis for Effective Classification
TLDR
We investigate the framework of frequent pattern-based classification, where a classification model is built in the feature space of single features as well as frequent patterns. Expand
  • 362
  • 30
  • PDF