Exploring Large Rule Spaces by Sampling

Abstract

A great challenge for data mining techniques is the huge space of potential rules which can be generated. If there are tens of thousands of items, then potential rules involving three items number in the trillions. Traditional data mining techniques rely on downward-closed measures such as support to prune the space of rules. However, in many applications, such pruning techniques either do not su ciently reduce the space of rules, or they are overly restrictive. We propose a new solution to this problem, called Dynamic Data Mining (DDM). DDM foregoes the completeness o ered by traditional techniques based on downward-closed measures in favor of the ability to drill deep into the space of rules and provide the user with a better view of the structure present in a data set. Instead of a single determinstic run, DDM runs continuously, exploring more and more of the rule space. Instead of using a downward-closed measure such as support to guide its exploration, DDM uses a user-de ned measure called weight, which is not restricted to be downward closed. The exploration is guided by a heuristic called the Heavy Edge Property. The system incorporates user feedback by allowing weight to be rede ned dynamically. We test the system on a particularly di cult data set { the word usage in a large subset of the World Wide Web. We nd that Dynamic Data Mining is an e ective tool for mining such di cult data sets.

Extracted Key Phrases

11 Figures and Tables

Cite this paper

@inproceedings{Brin1999ExploringLR, title={Exploring Large Rule Spaces by Sampling}, author={Sergey Brin}, year={1999} }