Ling-Ling Yan

Learn More
Clio is a system for managing and facilitating the complex tasks of heterogeneous data transformation and integration. In Clio, we have collected together a powerful set of data management techniques that have proven invaluable in tackling these difficult problems. In this paper, we present the underlying themes of our approach and present a brief case(More)
At the heart of many data-intensive applications is the problem of quickly and accurately transforming data into a new form. Database researchers have long advocated the use of declarative queries for this process. Yet tools for creating, managing and understanding the complex queries necessary for data transformation are still too primitive to permit(More)
Conflict tolerant queries are a new way of dealing with instance level conflicts in data integrated from multiple sources. In contrast to the traditional approach of resolving such conflicts during schema integration using aggregation functions, we establish a query model and processing techniques to tolerate these conflicts at query time to a degree(More)
The AURORA mediator system employs a novel 2-tier, plug-and-play mediation model that is designed to facilitate access to a large number of heterogeneous data sources. This paper describes AURORA's mediation model and a suite of techniques used by a specific AURORA mediator, AURORA-RH. This suite includes a mediation methodology provided via an interactive(More)
M. Roth M. A. Hernandez P. Coulthard L. Yan L. Popa H. C.-T. Ho C. C. Salter Extensible Markup Language (XML) has grown rapidly over the last decade to become the de facto standard for heterogeneous data exchange. Its popularity is due in large part to the ease with which diverse kinds of information can be represented as a result of the self-describing(More)
We propose a normal form for nested relations, called NF-NR, which removes undesirable anomalies from a nested relational database schema. Both functional dependencies and multivalued dependencies are considered. NF-NR reduces to 3NF/4NF if the nested relation considered is actually a flat relation. Especially, NF-NR removes global redundancies among a set(More)
We present a modular breakdown of data integration tasks and the results of a survey on the distribution of effort among those tasks. The modularization aids in project planning and enables portions of the work to be allocated flexibly among various human specialists, and also to automated tools. The survey results are useful for determining: (1) what are(More)