Dynamic itemset counting and implication rules for market basket data

@inproceedings{Brin1997DynamicIC,
  title={Dynamic itemset counting and implication rules for market basket data},
  author={Sergey Brin and Rajeev Motwani and Jeffrey D. Ullman and Shalom Tsur},
  booktitle={SIGMOD '97},
  year={1997}
}
We consider the problem of analyzing market-basket data and present several important contributions. [...] Key Method Second, we present a new way of generating “implication rules,” which are normalized based on both the antecedent and the consequent and are truly implications (not simply a measure of co-occurrence), and we show how they produce more intuitive results than other methods. Finally, we show how different characteristics of real data, as opposed by synthetic data, can dramatically affect the…Expand
A new framework for itemset generation
TLDR
An algorithm is provided which provides very good computational efficiency, while maintaining statistical robuetneas, which implies that the method can be applied to find association rules in datasets in which items may appear in a sizeable percentage of the transactions (dense datasets), dataset in which the items have varying density, or even negative association rules.
Mining Market Basket Data Using Share Measures and Characterized Itemsets
TLDR
The share-confidence framework for knowledge discovery from databases is proposed which addresses the problem of mining itemsets from market basket data and suggests how characterized itemsets can be generalized according to concept hierarchies associated with the characteristic attributes.
Mining Associations with the Collective Strength Approach
TLDR
An algorithm is provided which provides very good computational efficiency, while maintaining statistical robustness, and the fact that this algorithm relies on relative measures rather than absolute measures such as support implies that the method can be applied to find association rules in data sets in which items may appear in a sizeable percentage of the transactions.
A New Approach for the Discovery of Frequent Itemsets
  • R. Meo
  • Computer Science
    DaWaK
  • 1999
TLDR
An algorithm that requires only one pass on the database, presents linear scale-up property with the dimensions of the database and, as shown by the experiments, performs better than other classical algorithms.
Mining frequent itemsets in data streams within a time horizon
TLDR
The experimental results prove that the proposed algorithm for mining frequent itemsets in a stream of transactions within a limited time horizon is faster than other approaches but has a slightly higher cost in terms of memory.
Scalable APRIORI-Based Frequent Pattern Discovery
TLDR
This paper takes the classic algorithm for the frequent pattern discovery problem, A Priori, and by adding a vertical sort drastically improve its performance characteristics when processing very large datasets.
UNIC : UNique Item Counts for Association Rule Mining in Relational Data
Association rule mining (ARM) can be generalized to relational data by using joined relations as basis. We demonstrate that typically such an approach results in an overwhelming number of rules that
Advances in Mining Binary Data: Itemsets as Summaries
TLDR
This thesis shows how to use itemsets for answering queries, that is, finding out the number of transactions satisfying some given formula, and proposes a new concept called normalised correlation dimension, a known concept that works well with realvalued data.
Market basket analysis with networks
TLDR
It is demonstrated that the network based approach can concisely isolate influence among products, mitigating the need to search through massive lists of association rules, and an interestingness measure for communities of products is developed and shown to isolates useful, actionable communities.
Extracting Share Frequent Itemsets with Infrequent Subsets
TLDR
This work defines the problem of finding share frequent itemsets, and shows that share frequency does not have the property of downward closure when it is defined in terms of the itemset as a whole.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 11 REFERENCES
Fast Algorithms for Mining Association Rules
TLDR
Two new algorithms for solving thii problem that are fundamentally different from the known algorithms are presented and empirical evaluation shows that these algorithms outperform theknown algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems.
Fast algorithms for mining association rules
TLDR
Two new algorithms for solving thii problem that are fundamentally different from the known algorithms are presented and empirical evaluation shows that these algorithms outperform theknown algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems.
Mining sequential patterns
  • R. Agrawal, R. Srikant
  • Computer Science
    Proceedings of the Eleventh International Conference on Data Engineering
  • 1995
TLDR
Three algorithms are presented to solve the problem of mining sequential patterns over databases of customer transactions, and empirically evaluating their performance using synthetic data shows that two of them have comparable performance.
Sampling Large Databases for Association Rules
TLDR
New algorithms that reduce the database activity considerably by picking a Random sample, to find using this sample all association rules that probably hold in the whole database, and then to verify the results with the rest of the database.
Mining generalized association rules
TLDR
A new interest-measure for rules which uses the information in the taxonomy is presented, and given a user-specified “minimum-interest-level”, this measure prunes a large number of redundant rules.
Mining association rules between sets of items in large databases
TLDR
An efficient algorithm is presented that generates all significant association rules between items in the database of customer transactions and incorporates buffer management and novel estimation and pruning techniques.
SLIQ: A Fast Scalable Classifier for Data Mining
TLDR
Issues in building a scalable classifier are discussed and the design of SLIQ, a new classifier that uses a novel pre-sorting technique in the tree-growth phase to enable classification of disk-resident datasets is presented.
Database Mining: A Performance Perspective
TLDR
The authors' perspective of database mining as the confluence of machine learning techniques and the performance emphasis of database technology is presented and an algorithm for classification obtained by combining the basic rule discovery operations is given.
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases
We introduce a new model of similarity of time sequences that captures the intuitive notion that two sequences should be considered similar if they have enough non-overlapping time-ordered pairs of
Proc. of the Int'1 Conf. on Very Large Data Bases (VLDB)
  • Proc. of the Int'1 Conf. on Very Large Data Bases (VLDB)
  • 1995
...
1
2
...