Ratio Rules: A New Paradigm for Fast, Quantifiable Data Mining
Association Rule Mining algorithms operate on a data matrix (e.g., customers products) to derive association rules 2, 23]. We propose a new paradigm, namely, Ratio Rules, which are quantiiable in that we can measure the \goodness" of a set of discovered rules. We propose to use the \guessing error" as a measure of the \goodness", that is, the root-mean-square error of the reconstructed values of the cells of the given matrix, when we pretend that they are unknown. Another contribution is a novel method to guess miss-ing/hidden values from the Ratio Rules that our method derives. For example, if somebody bought $10 of milk and $3 of bread, our rules can \guess" the amount spent on, say, butter. Thus, we can perform a variety of important tasks such as forecasting, answering \what-if" scenarios, detecting outliers, and visualizing the data. Moreover, we show how to compute Ratio Rules in a single pass over the dataset with small memory requirements (a few small matrices), in contrast to traditional association rule mining methods that require multiple passes and/or large memory. Exper-Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. iments on several real datasets (e.g., basketball and baseball statistics, biological data) demonstrate that the proposed method consistently achieves a \guessing error" of up to 5 times less than the straightforward competitor .