gSpan: graph-based substructure pattern mining
A novel algorithm called gSpan (graph-based substructure pattern mining), which discovers frequent substructures without candidate generation by building a new lexicographic order among graphs, and maps each graph to a unique minimum DFS code as its canonical label.
PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks
- Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu
- Computer ScienceProceedings of the VLDB Endowment
- 1 August 2011
Under the meta path framework, a novel similarity measure called PathSim is defined that is able to find peer objects in the network (e.g., find authors in the similar field and with similar reputation), which turns out to be more meaningful in many scenarios compared with random-walk based similarity measures.
CloSpan: Mining Closed Sequential Patterns in Large Datasets
Frequent pattern mining: current status and future directions
- Jiawei Han, Hong Cheng, Dong Xin, Xifeng Yan
- Computer ScienceData mining and knowledge discovery
- 1 August 2007
It is believed that frequent pattern mining research has substantially broadened the scope of data analysis and will have deep impact on data mining methodologies and applications in the long run, however, there are still some challenging research issues that need to be solved before frequent patternmining can claim a cornerstone approach in data mining applications.
Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting
- SHIYANG LI, Xiaoyong Jin, Xifeng Yan
- Computer ScienceNeural Information Processing Systems
- 29 June 2019
First, convolutional self-attention is proposed by producing queries and keys with causal convolution so that local context can be better incorporated into attention mechanism, and LogSparse Transformer is proposed, improving forecasting accuracy for time series with fine granularity and strong long-term dependencies under constrained memory budget.
CloseGraph: mining closed frequent graph patterns
A closed graph pattern mining algorithm, CloseGraph, is developed by exploring several interesting pruning methods and shows that it not only dramatically reduces unnecessary subgraphs to be generated but also substantially increases the efficiency of mining, especially in the presence of large graph patterns.
Graph indexing: a frequent structure-based approach
The gIndex approach not only provides and elegant solution to the graph indexing problem, but also demonstrates how database indexing and query processing can benefit form data mining, especially frequent pattern mining.
Mining Frequent Patterns in Data Streams at Multiple Time Granularities
This paper proposes computing and maintaining all the frequent patterns and dynamically updating them with the incoming data streams and incrementally maintain tilted-time windows for each pattern at multiple time granularities.
SOBER: statistical model-based bug localization
The result demonstrated the power of the approach in bug localization: SOBER can help programmers locate 68 out of 130 bugs in the Siemens suite when programmers are expected to examine no more than 10% of the code, whereas the best previously reported is 52 out of130.
Statistical Debugging: A Hypothesis Testing-Based Approach
- Chao Liu, Long Fei, Xifeng Yan, Jiawei Han, S. Midkiff
- Computer ScienceIEEE Transactions on Software Engineering
- 1 October 2006
A new statistical method, called SOBER, is proposed, which automatically localizes software faults without any prior knowledge of the program semantics and models the predicate evaluation in both correct and incorrect executions.