• Publications
  • Influence
Data exchange: semantics and query answering
TLDR
This paper gives an algebraic specification that selects, among all solutions to the data exchange problem, a special class of solutions that is called universal and shows that a universal solution has no more and no less data than required for data exchange and that it represents the entire space of possible solutions.
Discovering data quality rules
TLDR
This work proposes a new data-driven tool that can be used within an organization's data quality management process to suggest possible rules, and to identify conformant and non-conformant records.
Framework for Evaluating Clustering Algorithms in Duplicate Detection
TLDR
This work uses Stringer to evaluate the quality of the clusters obtained from several unconstrained clustering algorithms used in concert with approximate join techniques and reveals that some clustering algorithm that have never been considered for duplicate detection, perform extremely well in terms of both accuracy and scalability.
LIMBO: Scalable Clustering of Categorical Data
TLDR
This work introduces LIMBO, a scalable hierarchical categorical clustering algorithm that builds on the Information Bottleneck (IB) framework for quantifying the relevant information preserved when clustering, and shows how the LIMBO algorithm can be used to cluster both tuples and values.
First-Order Query Rewriting for Inconsistent Databases
TLDR
An algorithm is given that computes the consistent answers for a large and practical class of conjunctive queries and returns a first-order query Q such that for every (potentially inconsistent) database I, the consistentswers for q can be obtained by evaluating Q directly on I.
Association rules over interval data
TLDR
An algorithm for mining association rules under the new definition of interest for association rules that takes into account the semantics of interval data is developed and the experience using the algorithm on large real-life datasets is overview.
...
...