SPADE is a new algorithm for fast discovery of Sequential Patterns that utilizes combinatorial properties to decompose the original problem into smaller sub-problems, that can be independently solved in main-memory using efficient lattice search techniques, and using simple join operations.
Efficient algorithms for the discovery of frequent itemsets which forms the compute intensive phase of the association mining task are presented and the effect of using different database layout schemes combined with the proposed decomposition and traverse techniques are presented.
New algorithms for fast association mining, which scan the database only once, are presented, addressing the open question whether all the rules can be efficiently extracted in a single database pass.
CHARM is an efficient algorithm for mining all frequent closed itemsets that enumerates closed sets using a dual itemset-tidset search tree, using an efficient hybrid search that skips many levels, and uses a technique called diffsets to reduce the memory footprint of intermediate computations.
This paper presents a novel vertical data representation called Diffset, that only keeps track of differences in the tids of a candidate pattern from its generating frequent patterns, and shows that diffsets drastically cut down the size of memory required to store intermediate results.
This work presents TREEMinER, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scope-list, and finds that TREEMINER outperforms the pattern matching approach by a factor of 4 to 20, and has good scaleup properties.
CHARM is an efficient algorithm for mining all frequent closed itemsets using a dual itemset-tidset search tree, using an efficient hybrid search that skips many levels, and uses a technique called diffsets to reduce the memory footprint of intermediate computations.
This research identifies a set of features that are key to the superior performance under the supervised learning setup, and shows that a small subset of features always plays a significant role in the link prediction job.
This textbook for senior undergraduate and graduate data mining courses provides a broad yet in-depth overview of data mining, integrating related concepts from machine learning and statistics.
This work presents TREEMINER, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scope-list, and contrasts it with a pattern matching tree mining algorithm (PATTERNMATCHER), and also compares it with TREEMinERD, which counts only distinct occurrences of a pattern.