Learn More
In this paper, we present a novel algorithm Opportune Project for mining complete set of frequent item sets by projecting databases to grow a frequent item set tree. Our algorithm is fundamentally different from those proposed in the past in that it opportunistically chooses between two different structures, array-based or tree-based, to represent projected(More)
— Publishing person specific data while protecting privacy is an important problem. Existing algorithms that enforce the privacy principle called l-diversity are heuristic based due to the NP-hardness. Several questions remain open: can we get a significant gain in the data utility from an optimal solution compared to heuristic ones; can we improve the(More)
—Utility mining emerged recently to address the limitation of frequent itemset mining by introducing interest-ingness measures that reflect both the statistical significance and the user's expectation. Among utility mining problems, utility mining with the itemset share framework is a hard one as no anti-monotone property holds with the interestingness(More)
Privacy protection in publishing transaction data is an important problem. A key feature of transaction data is the extreme spar-sity, which renders any single technique ineffective in anonymizing such data. Among recent works, some incur high information loss, some result in data hard to interpret, and some suffer from performance drawbacks. This paper(More)
Medical systems allow patients to receive care at different hospitals. However, this entails considerable inconvenience through the need to transport patients and their medical records between hospitals. The development of Telecare Medicine Information Systems (TMIS) makes it easier for patients to seek medical treatment and to store and access medical(More)
—Web query logs provide a rich wealth of information , but also present serious privacy risks. We consider publishing vocabularies, bags of query-terms extracted from web query logs, which has a variety of applications. We aim at preventing identity disclosure of such bag-valued data. The key feature of such data is the extreme sparsity, which renders(More)
Web query logs provide a rich wealth of information, but also present serious privacy risks. We preserve privacy in publishing vocabularies extracted from a web query log by introducing vocabulary k-anonymity, which prevents the privacy attack of re-identification that reveals the real identities of vocabularies. A vocabulary is a bag of query-terms(More)