• Publications
  • Influence
SPADE: An Efficient Algorithm for Mining Frequent Sequences
TLDR
SPADE is a new algorithm for fast discovery of Sequential Patterns that utilizes combinatorial properties to decompose the original problem into smaller sub-problems, that can be independently solved in main-memory using efficient lattice search techniques, and using simple join operations.
Scalable Algorithms for Association Mining
TLDR
Efficient algorithms for the discovery of frequent itemsets which forms the compute intensive phase of the association mining task are presented and the effect of using different database layout schemes combined with the proposed decomposition and traverse techniques are presented.
New Algorithms for Fast Discovery of Association Rules
TLDR
New algorithms for fast association mining, which scan the database only once, are presented, addressing the open question whether all the rules can be efficiently extracted in a single database pass.
CHARM: An Efficient Algorithm for Closed Itemset Mining
TLDR
CHARM is an efficient algorithm for mining all frequent closed itemsets that enumerates closed sets using a dual itemset-tidset search tree, using an efficient hybrid search that skips many levels, and uses a technique called diffsets to reduce the memory footprint of intermediate computations.
Fast vertical mining using diffsets
TLDR
This paper presents a novel vertical data representation called Diffset, that only keeps track of differences in the tids of a candidate pattern from its generating frequent patterns, and shows that diffsets drastically cut down the size of memory required to store intermediate results.
Efficiently mining frequent trees in a forest
TLDR
This work presents TREEMinER, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scope-list, and finds that TREEMINER outperforms the pattern matching approach by a factor of 4 to 20, and has good scaleup properties.
Efficient algorithms for mining closed itemsets and their lattice structure
TLDR
CHARM is an efficient algorithm for mining all frequent closed itemsets using a dual itemset-tidset search tree, using an efficient hybrid search that skips many levels, and uses a technique called diffsets to reduce the memory footprint of intermediate computations.
Link prediction using supervised learning
TLDR
This research identifies a set of features that are key to the superior performance under the supervised learning setup, and shows that a small subset of features always plays a significant role in the link prediction job.
Data Mining and Analysis: Fundamental Concepts and Algorithms
TLDR
This textbook for senior undergraduate and graduate data mining courses provides a broad yet in-depth overview of data mining, integrating related concepts from machine learning and statistics.
Efficiently mining frequent trees in a forest: algorithms and applications
  • Mohammed J. Zaki
  • Computer Science
    IEEE Transactions on Knowledge and Data…
  • 1 August 2005
TLDR
This work presents TREEMINER, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scope-list, and contrasts it with a pattern matching tree mining algorithm (PATTERNMATCHER), and also compares it with TREEMinERD, which counts only distinct occurrences of a pattern.
...
...