Si Yin

Learn More
We present NADEEF, an extensible, generic and easy-to-deploy data cleaning system. NADEEF distinguishes between a programming interface and a core to achieve generality and extensibility. The programming interface allows users to specify data quality rules by writing code that implements predefined classes. These classes uniformly define what is wrong with(More)
Data cleansing approaches have usually focused on detecting and fixing errors with little attention to scaling to big datasets. This presents a serious impediment since data cleansing often involves costly computations such as enumerating pairs of tuples, handling inequality joins, and dealing with user-defined functions. In this paper, we present(More)
networks are part of the telecommunications infrastructure that connect individual subscribers to the service provider's central office (CO) over public ground. They are cost prohibitive and have consistently been regarded as the bottleneck , primarily because the ever-growing demand for higher bandwidth is beyond the supported levels of the widely deployed(More)
Entity resolution (ER), the process of identifying and eventually merging records that refer to the same real-world entities , is an important and long-standing problem. We present Nadeef/Er, a generic and interactive entity resolution system , which is built as an extension over our open-source generalized data cleaning system Nadeef. Nadeef/Er provides a(More)
  • 1