Renée J. Miller

Learn More
We present a novel framework for mapping between any combination of XML and relational schemas, in which a high-level, userspecified mapping is translated into semantically meaningful queries that transform source data into the target representation. Our approach works in two phases. In the first phase, the high-level mapping, expressed as a set of(More)
Dirty data is a serious problem for businesses leading to incorrect decision making, inefficient daily operations, and ultimately wasting both time and money. Dirty data often arises when domain constraints and business rules, meant to preserve data consistency and accuracy, are enforced incompletely or not at all in application code. In this work, we(More)
Clustering is a problem of great practical importance in numerous applications. The problem of clustering becomes more challenging when the data is categorical, that is, when there is no inherent distance measure between data values. We introduce LIMBO, a scalable hierarchical categorical clustering algorithm that builds on the Information Bottleneck (IB)(More)
Clio is a system for managing and facilitating the complex tasks of heterogeneous data transformation and integration. In Clio, we have collected together a powerful set of data management techniques that have proven invaluable in tackling these difficult problems. In this paper, we present the underlying themes of our approach and present a brief case(More)
The detection of duplicate tuples, corresponding to the same real-world entity, is an important task in data integration and cleaning. While many techniques exist to identify such tuples, the merging or elimination of duplicates can be a difficult task that relies on ad-hoc and often manual solutions. We propose a complementary approach that permits(More)
We consider the problem of mapping data in peer-to-peer data-sharing systems. Such systems often rely on the use of mapping tables listing pairs of corresponding values to search for data residing in different peers. In this paper, we address semantic and algorithmic issues related to the use of mapping tables. We begin by arguing why mapping tables are(More)
We consider the integration requirements of modern data intensive applications including data warehousing, global information systems and electronic commerce. At the heart of these requirements lies the schema mapping problem in which a source (legacy) database must be mapped into a different, but xed, target schema. The goal of schema mapping is the(More)