KATARA: Reliable Data Cleaning with Knowledge Bases and Crowdsourcing

  title={KATARA: Reliable Data Cleaning with Knowledge Bases and Crowdsourcing},
  author={Xu Chu and Mourad Ouzzani and John Morcos and Ihab F. Ilyas and Paolo Papotti and Nan Tang and Yin Ye},
Data cleaning with guaranteed reliability is hard to achieve without accessing external sources, since the truth is not necessarily discoverable from the data at hand. Furthermore, even in the presence of external sources, mainly knowledge bases and humans, effectively leveraging them still faces many challenges, such as aligning heterogeneous data sources and decomposing a complex task into simpler units that can be consumed by humans. We present Katara, a novel end-to-end data cleaning system… CONTINUE READING