Rakesh Agrawal

Learn More
We are given a large database of customer transac tions where each transaction consists of customer id transaction time and the items bought in the transac tion We introduce the problem of mining sequential patterns over such databases We present three algo rithms to solve this problem and empirically evalu ate their performance using synthetic data Two of(More)
The problem of mining sequential patterns was recently introduced in [3]. We are given a database of sequences, where each sequence is a list of transactions ordered by transaction-time, and each transaction is a set of items. The problem is to discover all sequential patterns with a user-speci ed minimum support, where the support of a pattern is the(More)
We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the rst few frequencies are strong. Another important observation is Parseval's theorem, which(More)
A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models without access to precise information in individual data(More)
Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering(More)
We study the problem of answering ambiguous web queries in a setting where there exists a taxonomy of information, and that both queries and documents may belong to more than one category according to this taxonomy. We present a systematic approach to diversifying results that aims to minimize the risk of dissatisfaction of the average user. We propose an(More)
Encryption is a well established technology for protecting sensitive data. However, once encrypted, data can no longer be easily queried aside from exact matches. We present an order-preserving encryption scheme for numeric data that allows any comparison operation to be directly applied on encrypted data. Query results produced are sound (no false hits)(More)