Wilburt Labio

Learn More
Warehouse views need to be updated when source data changes. Due to the constantly increasing size of warehouses and the rapid rates of change, there is increasing pressure to reduce the time taken for updating the warehouse views. In this paper we focus on reducing this “update window” by minimizing the work required to compute and install a(More)
The goal of the data warehousing project at Stanford (the WHIPS project) is to develop algorithms and tools for the efficient collection and integration of information from heterogeneous and autonomous sources, including legacy sources. In this paper we give a brief overview of the WHIPS project, and we describe some of the research problems being addressed(More)
A data warehouse collects and integrates data from multiple, autonomous, heterogeneous, sources. The warehouse e ectively maintains one or more materialized views over the source data. In this paper we describe the architecture of the Whips prototype system, which collects, transforms, and integrates data for the warehouse. We show how the required(More)
On the Internet, the limited query-processing capabilities of sources make answering even the simplest queries challenging. In this paper, we present a scheme called GenCompact for generating capability-sensitive plans for queries on Internet sources. The query plans generated by GenCompact have the followingadvantages over those generated by existing(More)
Data warehouses collect data into materialized views for analysis. After some time, some of the data may no longer be needed or may not be of interest. In this paper, we handle this by expiring or removing unneeded materialized view tuples. A framework supporting such expiration is presented. Within it, a user or administrator can declaratively request(More)
Data warehouses collect large quantities of data from distributed sources into a single repository. A typical load to create or maintain a warehouse processes GBs of data, takes hours or even days to execute, and involves many complex and user-defined transformations of the data (e.g., find duplicates, resolve data inconsistencies, and add unique keys). If(More)
A data warehouse is a repository of data that has been extracted and integrated from heterogeneous and autonomous distributed sources. For example, a grocery store chain might integrate data from its inventory database, sales databases from different stores, and its marketing department’s promotions records. Warehouse applications differ from traditional(More)
A data warehouse is a repository of integrated information from distributed, autonomous, and possibly heterogeneous, sources. In effect, the warehouse stores one or more materialized views of the source data. The data is then readily available to user applications for querying and analysis. Figure 1 shows the basic architecture of a warehouse: data is(More)