Si Yin

Learn More
Data cleansing approaches have usually focused on detecting and fixing errors with little attention to scaling to big datasets. This presents a serious impediment since data cleansing often involves costly computations such as enumerating pairs of tuples, handling inequality joins, and dealing with user-defined functions. In this paper, we present(More)
We present NADEEF, an extensible, generic and easy-to-deploy data cleaning system. NADEEF distinguishes between a programming interface and a core to achieve generality and extensibility. The programming interface allows users to specify data quality rules by writing code that implements predefined classes. These classes uniformly define what is wrong with(More)
networks are part of the telecommunications infrastructure that connect individual subscribers to the service provider's central office (CO) over public ground. They are cost prohibitive and have consistently been regarded as the bottleneck , primarily because the ever-growing demand for higher bandwidth is beyond the supported levels of the widely deployed(More)
transmission efficiency have been investigated in an ad hoc manner. In this paper, we establish a general state space model to analyze the stability of the NLPDBA schemes from the TDM-PON system's point of view, and propose controller design guidelines to maintain the system stability under different scenarios. We prove that a TDM-PON system with NLPDBA is(More)
Entity resolution (ER), the process of identifying and eventually merging records that refer to the same real-world entities , is an important and long-standing problem. We present Nadeef/Er, a generic and interactive entity resolution system , which is built as an extension over our open-source generalized data cleaning system Nadeef. Nadeef/Er provides a(More)
  • 1