Learn More
This chapter covers the Trio database management system. Trio is a robust prototype that supports uncertain data and data lineage, along with the standard features of a relational DBMS. Trio's new ULDB data model is an extension to the relational model capturing various types of uncertainty along with data lineage, and its TriQL query language extends SQL(More)
Disk-oriented approaches to online storage are becoming increasingly problematic: they do not scale gracefully to meet the needs of large-scale Web applications, and improvements in disk capacity have far outstripped improvements in access latency and bandwidth. This paper argues for a new approach to datacenter storage called RAMCloud, where information is(More)
We study how best to schedule scans of large data files, in the presence of many simultaneous requests to a common set of files. The objective is to maximize the overall rate of processing these files, by sharing scans of the same file as aggressively as possible, without imposing undue wait time on individual jobs. This scheduling problem arises in batch(More)
Trio is a new kind of database system that supports data, uncertainty , and lineage in a fully integrated manner. The first Trio prototype , dubbed Trio-One, is built on top of a conventional DBMS using data and query translation techniques together with a small number of stored procedures. This paper describes Trio-One's translation scheme and system(More)
Prior work has identified set based comparisons as a useful primitive for supporting a wide variety of similarity functions in record matching. Accordingly, various techniques have been proposed to improve the performance of set similarity lookups. However, this body of work focuses almost exclusively on <i>symmetric</i> notions of set similarity. In this(More)
The query models of the recent generation of <i>very large scale distributed (VLSD)</i> shared-nothing data storage systems, including our own PNUTS and others (e.g. BigTable, Dynamo, Cassandra, etc.) are intentionally simple, focusing on simple lookups and scans and trading query expressiveness for massive scale. Indexes and views can expand the query(More)
There has been considerable past work studying data integration and uncertain data in isolation. We develop the foundations for local-as-view (LAV) data integration when the sources being integrated are uncertain. We motivate two distinct settings for uncertain-data integration. We then define containment of uncertain databases in these settings, which(More)
Trio is a new kind of database system that supports data, uncertainty , and lineage in a fully integrated manner. The first Trio prototype , dubbed Trio-One, is built on top of a conventional DBMS using data and query translation techniques together with a small number of stored procedures. This paper describes Trio-One's translation scheme and system(More)
We present extensions to Trio for incorporating continuous uncertainty into the system. Data items with uncertain possible values drawn from a continuous domain are represented through a generic set of functions. Our approach enables precise and efficient representation of arbitrary probability distribution functions, along with standard distributions such(More)