Learn More
This paper considers the problem of publishing "transaction data" for research purposes. Each transaction is an arbitrary set of items chosen from a large universe. Detailed transaction data provides an electronic image of one's life. This has two implications. One, transaction data are excellent candidates for data mining research. Two, use of transaction(More)
Indexing microblogs for real-time search is challenging given the efficiency issue caused by the tremendous speed at which new microblogs are created by users. Existing approaches address this efficiency issue at the cost of query accuracy, as they either (i) exclude a significant portion of microblogs from the index to reduce update cost or (ii) rank(More)
Mining frequent patterns is a fundamental and important problem in many data mining applications. Many of the algorithms adopt the pattern growth approach, which is shown to be superior to the candidate generate-andtest approach significantly. In this paper, we identify the key factors that influence the performance of the pattern growth approach, and(More)
Mining frequent patterns, including mining frequent closed patterns or maximal patterns, is a fundamental and important problem in data mining area. Many algorithms adopt the pattern growth approach, which is shown to be superior to the candidate generate-and-test approach, especially when long patterns exist in the datasets. In this paper, we identify the(More)
Group based anonymization is the most widely studied approach for privacy-preserving data publishing. Privacy models/definitions using group based anonymization includes <i>k</i>-anonymity, <i>l</i>-diversity, and <i>t</i>-closeness, to name a few. The goal of this article is to raise a fundamental issue regarding the privacy exposure of the approaches(More)
We consider the problem of publishing sensitive transaction data with privacy preservation. High dimensionality of transaction data poses unique challenges on data privacy and data utility. On one hand, re-identification attacks tend to use a subset of items that infrequently occur in transactions, called moles. On the other hand, data mining applications(More)
Temporal data are time-critical in that the snapshot at each timestamp must be made available to researchers in a timely fashion. However, due to the limited data, each snapshot likely has a skewed distribution on sensitive values, which renders classical anonymization methods not possible. In this work, we propose the &#x0201C;reposition model&#x0201D; to(More)
In this paper, we propose a new framework for mining frequent patterns from large transactional databases. The core of the framework is of a novel coded prefix-path tree with two representations, namely, a memory-based prefixpath tree and a disk-based prefix-path tree. The disk-based prefix-path tree is simple in its data structure yet rich in information(More)