Learn More
Database researchers have striven to improve the capability of a database in terms of both performance and functionality. We assert that the <i>usability</i> of a database is as important as its capability. In this paper, we study why database systems today are so difficult to use. We identify a set of five pain points and propose a research agenda to(More)
—Computing interesting measures for data cubes and subsequent mining of interesting cube groups over massive datasets are critical for many important analyses done in the real world. Previous studies have focused on algebraic measures such as SUM that are amenable to parallel computation and can easily benefit from the recent advancement of parallel(More)
1. MOTIVATION Affordable storage has grown dramatically over the last decade, enabling large-scale archival of email, contacts, documents , images, and music. Large personal storage becomes all the more useful with effective search tools. Recent operating systems can index PC hard disks and enable keyword search over many file types. All Web-based email(More)
Autocompletion is a widely deployed facility in systems that require user input. Having the system complete a partially typed " word " can save user time and effort. In this paper, we study the problem of autocompletion not just at the level of a single " word " , but at the level of a multi-word " phrase ". There are two main challenges: one is that the(More)
In recent years, many Array DBMSs, including SciDB and RasDaMan have emerged to meet the needs of data management applications where the natural structures are the arrays. These systems, like their relational counterparts, involve an expensive <i>data ingestion</i> phase. The paradigm of using native storage as a DB and providing database-like support(More)
— Cube computation over massive datasets is critical for many important analyses done in the real world. Unlike commonly studied algebraic measures such as SUM that are amenable to parallel computation, efficient cube computation of holistic measures such as TOP-K is non-trivial and often impossible with current methods. In this paper we detail real-world(More)
We address the problem of unsupervised matching of schema information from a large number of data sources into the schema of a data warehouse. The matching process is the first step of a framework to integrate data feeds from third-party data providers into a structured-search engine's data warehouse. Our experiments show that traditional schema-based and(More)