Mining All Non-derivable Frequent Itemsets

  title={Mining All Non-derivable Frequent Itemsets},
  author={Toon Calders and Bart Goethals},
Recent studies on frequent itemset mining algorithms resulted in significant performance improvements. However, if the minimal support threshold is set too low, or the data is highly correlated, the number of frequent itemsets itself can be prohibitively large. To overcome this problem, recently several proposals have been made to construct a concise representation of the frequent itemsets, instead of mining all frequent itemsets. The main goal of this paper is to identify redundancies in the… 

Depth-First Non-Derivable Itemset Mining

A depth-first algorithm, dfNDI, that is based on Eclat for mining the non-derivable itemsets is presented, and experiments show thatdfNDI outperforms NDI with an order of magnitude.

Non-derivable itemset mining

This paper constructs a condensed representation of all frequent itemsets, by removing those itemsets for which the support can be derived, resulting in the so called Non-Derivable Itemsets (NDI) representation.

Non-Almost-Derivable Frequent Itemsets Mining

This paper proposes a new condensed representation called frequent non-almost-derivable itemsets, a subset of the original collection of frequent itemsets that can derive a lower and an upper bound of its support from this representation, and the lower bound and the upper bound is close enough to be controlled by a user-defined parameter.

A False Negative Maximal Frequent Itemset Mining Algorithm over Stream

This paper focuses on mining maximal frequent itemsets approximately over a stream landmark model, and proposes an efficient algorithm named FNMFIMoDS, which achieves a faster speed and a much reduced memory cost in comparison with the state-of-the-art algorithm.

Efficient Computation of Partial-Support for Mining Interesting Itemsets

This paper addresses the problem of efficiently calculating partial supports, which leads to efficient algorithms for mining interesting itemsets in that class, and shows that there exists a recurrence relation between partial supports.

Dense itemsets

This paper addresses the problem of computing all dense itemsets in a database, and gives a levelwise algorithm for this problem, and studies the top-$k$ variations, i.e., finding the k densest sets with a given support, or the k best-supported sets withA given density.

Deducing Bounds on the Support of Itemsets

  • T. Calders
  • Computer Science
    Database Support for Data Mining Applications
  • 2004
A complete set of rules for deducing tight bounds on the support of an itemset if the supports of all its subsets are known and how to reduce the size of an adequate representation of the collection of frequent sets is given.

Finding Top-k Fuzzy Frequent Itemsets from Databases

Theoretical analysis and experimental studies over 4 datasets demonstrate that the proposed algorithm can efficiently decrease the runtime and memory cost, and significantly outperform the naive algorithm Top-k-FFI-Miner.

Mining summarization of high utility itemsets




Deducing Bounds on the Frequency of Itemsets

A complete set of rules for deducing tight bounds on the frequency of an itemset if the frequencies of all its subsets are known are given, which allow for reducing data access and providing a more compact output.

Discovering Frequent Closed Itemsets for Association Rules

This paper proposes a new algorithm, called A-Close, using a closure mechanism to find frequent closed itemsets, and shows that this approach is very valuable for dense and/or correlated data that represent an important part of existing databases.

CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets

An e cient algorithm, CLOSET, for mining closed itemsets is proposed, with the development of three techniques: applying a compressed, frequent pattern tree FP-tree structure for miningclosed itemsets without candidate generation, and developing a single pre x path compression technique to identify frequent closed itemset quickly.

Approximation of Frequency Queris by Means of Free-Sets

It is shown that frequent free-sets can be efficiently extracted using pruning strategies developed for frequent item-set discovery, and that they can be used to approximate the support of any frequent itemset.

Frequent Closures as a Concise Representation for Binary Data Mining

The concept of almost-closure (generation of every frequent set from frequent almost-closures remains possible but with a bounded error on frequency) is introduced and to the best of the knowledge, this is a new concept and, here again, some experimental evidence of its add-value is provided.

Concise representation of frequent patterns based on disjunction-free generators

  • Marzena Kryszkiewicz
  • Computer Science
    Proceedings 2001 IEEE International Conference on Data Mining
  • 2001
A new lossless representation of frequent patterns based on disjunction-free generators is offered that is more concise than two of the basic representations and more efficiently computable than the third representation.

Multiple Uses of Frequent Sets and Condensed Representations (Extended Abstract)

This paper shows how frequent sets can be used as a condensed representation for answering various types of queries, and defines a general notion of condensed representations, and shows that frequent sets, samples and the data cube can be viewed as instantations of this concept.

A condensed representation to find frequent patterns

This paper shows that a condensed representation of the frequent patterns called disjunction-free sets can be used to regenerate all frequent patterns and their exact frequencies, and this regeneration can be performed without any access to the original data.

Mining frequent patterns with counting inference

It is shown that the support of frequent non-key patterns can be inferred from frequent key patterns without accessing the database, and PASCAL is among the most efficient algorithms for mining frequent patterns.

Fast algorithms for mining association rules

Two new algorithms for solving thii problem that are fundamentally different from the known algorithms are presented and empirical evaluation shows that these algorithms outperform theknown algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems.