Mining ratio rules via principal sparse non-negative matrix factorization
Association Rule Mining algorithms operate on a data matrix (e.g., customers products) to derive association rules [2, 23]. We propose a new paradigm, namely, Ratio Rules, which are quanti able in that we can measure the \goodness" of a set of discovered rules. We propose to use the \guessing error" as a measure of the \goodness", that is, the rootmean-square error of the reconstructed values of the cells of the given matrix, when we pretend that they are unknown. Another contribution is a novel method to guess missing/hidden values from the Ratio Rules that our method derives. For example, if somebody bought $10 of milk and $3 of bread, our rules can \guess" the amount spent on, say, butter. Thus, we can perform a variety of important tasks such as forecasting, answering \what-if" scenarios, detecting outliers, and visualizing the data. Moreover, we show how to compute Ratio Rules in a single pass over the dataset with small memory requirements (a few small matrices), in contrast to traditional association rule mining methods that require multiple passes and/or large memory. ExperWork performed while at the University of Maryland. This research was partially funded by the Institute for Systems Research (ISR), and by the National Science Foundation under Grants No. EEC-94-02384, IRI-9205273 and IRI-9625428. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 24th VLDB Conference New York, USA, 1998 iments on several real datasets (e.g., basketball and baseball statistics, biological data) demonstrate that the proposed method consistently achieves a \guessing error" of up to 5 times less than the straightforward competitor.