Si Yin

Learn More
Data cleansing approaches have usually focused on detecting and fixing errors with little attention to scaling to big datasets. This presents a serious impediment since data cleansing often involves costly computations such as enumerating pairs of tuples, handling inequality joins, and dealing with user-defined functions. In this paper, we present(More)
We present NADEEF, an extensible, generic and easy-to-deploy data cleaning system. NADEEF distinguishes between a programming interface and a core to achieve generality and extensibility. The programming interface allows users to specify data quality rules by writing code that implements predefined classes. These classes uniformly define what is wrong with(More)
Passive optical networks are a prominent broadband access solution to tackle the "last mile" bottleneck in telecommunications infrastructure. Data transmission over standardized PONs is divided into time slots. Toward the end of PON performance improvement, a critical issue relies on resource management in the upstream transmission from multiple optical(More)
transmission efficiency have been investigated in an ad hoc manner. In this paper, we establish a general state space model to analyze the stability of the NLPDBA schemes from the TDM-PON system's point of view, and propose controller design guidelines to maintain the system stability under different scenarios. We prove that a TDM-PON system with NLPDBA is(More)
Entity resolution (ER), the process of identifying and eventually merging records that refer to the same real-world entities , is an important and long-standing problem. We present Nadeef/Er, a generic and interactive entity resolution system , which is built as an extension over our open-source generalized data cleaning system Nadeef. Nadeef/Er provides a(More)
  • 1