Learn More
This paper considers the problem of publishing "transaction data" for research purposes. Each transaction is an arbitrary set of items chosen from a large universe. Detailed transaction data provides an electronic image of one's life. This has two implications. One, transaction data are excellent candidates for data mining research. Two, use of transaction(More)
Mining frequent patterns is a fundamental and important problem in many data mining applications. Many of the algorithms adopt the pattern growth approach, which is shown to be superior to the candidate generate-and-test approach significantly. In this paper, we identify the key factors that influence the performance of the pattern growth approach, and(More)
Biosequences typically have a small alphabet, a long length, and patterns containing gaps (i.e., "don't care") of arbitrary size. Mining frequent patterns in such sequences faces a different type of explosion than in transaction sequences primarily motivated in market-basket analysis. In this paper, we study how this explosion affects the classic sequential(More)
We consider the problem of publishing sensitive transaction data with privacy preservation. High dimensionality of transaction data poses unique challenges on data privacy and data utility. On one hand, re-identification attacks tend to use a subset of items that infrequently occur in transactions, called moles. On the other hand, data mining applications(More)
Group based anonymization is the most widely studied approach for privacy-preserving data publishing. Privacy models/definitions using group based anonymization includes <i>k</i>-anonymity, <i>l</i>-diversity, and <i>t</i>-closeness, to name a few. The goal of this article is to raise a fundamental issue regarding the privacy exposure of the approaches(More)
— While previous works on privacy-preserving serial data publishing consider the scenario where sensitive values may persist over multiple data releases, we find that no previous work has sufficient protection provided for sensitive values that can change over time, which should be the more common case. In this work, we propose to study the privacy(More)
In this paper, we propose a new framework for mining frequent patterns from large transactional databases. The core of the framework is of a novel coded prefix-path tree with two representations, namely, a memory-based prefix-path tree and a disk-based prefix-path tree. The disk-based prefix-path tree is simple in its data structure yet rich in information(More)