• Publications
  • Influence
Mining association rules between sets of items in large databases
An efficient algorithm is presented that generates all significant association rules between items in the database of customer transactions and incorporates buffer management and novel estimation and pruning techniques. Expand
Fast algorithms for mining association rules
Two new algorithms for solving thii problem that are fundamentally different from the known algorithms are presented and empirical evaluation shows that these algorithms outperform theknown algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems. Expand
Mining sequential patterns
Three algorithms are presented to solve the problem of mining sequential patterns over databases of customer transactions, and empirically evaluating their performance using synthetic data shows that two of them have comparable performance. Expand
Mining Sequential Patterns: Generalizations and Performance Improvements
This work adds time constraints that specify a minimum and/or maximum time period between adjacent elements in a pattern, and relax the restriction that the items in an element of a sequential pattern must come from the same transaction. Expand
Fast Discovery of Association Rules
Automatic subspace clustering of high dimensional data for data mining applications
CLIQUE is presented, a clustering algorithm that satisfies each of these requirements of data mining applications including the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. Expand
Efficient Similarity Search In Sequence Databases
An indexing method for time sequences for processing similarity queries using R * -trees to index the sequences and efficiently answer similarity queries and provides experimental results which show that the method is superior to search based on sequential scanning. Expand
Order preserving encryption for numeric data
This work presents an order-preserving encryption scheme for numeric data that allows any comparison operation to be directly applied on encrypted data, and is robust against estimation of the true value in such environments. Expand
Diversifying search results
This work proposes an algorithm that well approximates this objective in general, and is provably optimal for a natural special case, and generalizes several classical IR metrics, including NDCG, MRR, and MAP, to explicitly account for the value of diversification. Expand