Impact of I/O and execution scheduling strategies on large scale parallel data mining
Association rule mining is one of the most important techniques in data mining. It extracts significant patterns from transaction databases and generates rules used in many decision support applications. Many organizations such as industrial, commercial, or even scientific sites may produce large amount of transactions and attributes. Mining effective rules from such large volumes of data requires much time and computing resources. In this paper, we propose a parallel Fl-growth association rule mining algorithm for rapid extraction of frequent itemsets from large dense databases. We also show that this algorithm can efficiently be parallelized in a cluster computing environment. The preliminary experiments provide quite promising results, with nearly ideal scaling on small clusters and about half of ideal (15 fold speedup) on a thirty-two processor cluster.