Learn More
k-Anonymity is a privacy preserving method for limiting disclosure of private information in data mining. The process of anonymizing a database table typically involves generalizing table entries and, consequently, it incurs loss of relevant information. This motivates the search for anonymization algorithms that achieve the required level of anonymization(More)
Entity resolution is the process of discovering groups of tuples that correspond to the same real world entity. In order to avoid the prohibitively expensive comparison of all pairs of tuples, blocking algorithms separate the tuples into blocks which are highly likely to contain matching pairs. Tuning is a major challenge in the blocking process. In(More)
We study the problem of query evaluation over tuple-independent probabilistic databases. We define a new characterization of lineage expressions called disjoint branch acyclic, and show this class to be computed in P-time. Specifically, this work extends the class of lin-eage expressions for which evaluation can be performed in PTIME. We achieve this(More)
Data mining is the process of extracting interesting patterns or knowledge from huge amount of data. In recent years, there has been a tremendous growth in the amount of personal data that can be collected and analyzed by the organizations. Organizations such as credit card companies, real estate companies and hospitals collect and hold large volumes of(More)
This work extends the class of lineage expressions of queries over tuple independent probabilistic databases for which evaluation can be performed in PTIME. We define a new characterization of lineage expressions, called γ-acyclic, and present a method to compute the probability of such expressions in PTIME. The method is based on the junction tree message(More)