Effective Data Cleaning with Continuous Evaluation

Abstract

Enterprises have been acquiring large amounts of data from a variety of sources to build their own “Data Lakes”, with the goal of enriching their data asset and enabling richer and more informed analytics. The pace of the acquisition and the variety of the data sources make it impossible to clean this data as it arrives. This new reality has made data cleaning a continuous process and a part of day-to-day data processing activities. The large body of data cleaning algorithms and techniques is strong evidence of how complex the problem is, yet, it has had little success in being adopted in real-world data cleaning applications. In this article we examine how the community has been evaluating the effectiveness of data cleaning algorithms, and if current data cleaning proposals are solving the right problems to enable the development of deployable and effective solutions.

Extracted Key Phrases

4 Figures and Tables

Cite this paper

@article{Ilyas2016EffectiveDC, title={Effective Data Cleaning with Continuous Evaluation}, author={Ihab F. Ilyas}, journal={IEEE Data Eng. Bull.}, year={2016}, volume={39}, pages={38-46} }