• Publications
  • Influence
NADEEF: a commodity data cleaning system
Despite the increasing importance of data quality and the rich theoretical and practical contributions in all aspects of data cleaning, there is no single end-to-end off-the-shelf solution toExpand
  • 228
  • 26
  • PDF
Graph pattern matching
Graph pattern matching is typically defined in terms of subgraph isomorphism, which makes it an np-complete problem. Moreover, it requires bijective functions, which are often too restrictive toExpand
  • 256
  • 26
  • PDF
Towards certain fixes with editing rules and master data
A variety of integrity constraints have been studied for data cleaning. While these constraints can detect the presence of errors, they fall short of guiding us to correct the errors. Indeed, dataExpand
  • 156
  • 22
  • PDF
Interaction between Record Matching and Data Repairing
Central to a data cleaning system are record matching and data repairing. Matching aims to identify tuples that refer to the same real-world object, and repairing is to make a database consistent byExpand
  • 125
  • 11
  • PDF
Distributed Representations of Tuples for Entity Resolution
Despite the efforts in 70+ years in all aspects of entity resolution (ER), there is still a high demand for democratizing ER – by reducing the heavy human involvement in labeling data, performingExpand
  • 47
  • 10
  • PDF
KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing
Classical approaches to clean data have relied on using integrity constraints, statistics, or machine learning. These approaches are known to be limited in the cleaning accuracy, which can usually beExpand
  • 169
  • 9
  • PDF
Adding regular expressions to graph reachability and pattern queries
It is increasingly common to find graphs in which edges bear different types, indicating a variety of relationships. For such graphs we propose a class of reachability queries and a class of graphExpand
  • 90
  • 8
Adding regular expressions to graph reachability and pattern queries
It is increasingly common to find graphs in which edges are of different types, indicating a variety of relationships. For such graphs we propose a class of reachability queries and a class of graphExpand
  • 68
  • 6
  • PDF
BigDansing: A System for Big Data Cleansing
Data cleansing approaches have usually focused on detecting and fixing errors with little attention to scaling to big datasets. This presents a serious impediment since data cleansing often involvesExpand
  • 105
  • 5
  • PDF
The Data Civilizer System
In many organizations, it is often challenging for users to find relevant data for specific tasks, since the data is usually scattered across the enterprise and often inconsistent. In fact, dataExpand
  • 85
  • 4
  • PDF