Learn More
In this paper, we introduce self-tuning histograms. Although similar in structure to traditional histograms, these histograms infer data distributions not by examining the data or a sample thereof, but by using feedback from the query execution engine about the actual selectivity of range selection operators to progressively refine the histogram. Since the(More)
Recently, there has been a great deal of research into XML query languages to enable the execution of database-style queries over XML files. However, merely being an XML query-processing engine does not render a system suitable for querying the Internet. A useful system must provide mechanisms to (a) find the XML files that are relevant to a given query,(More)
Cardinality estimation is a crucial part of a cost-based optimizer. Many research efforts have been focused on XML synopsis structures of path queries for cardinality estimation in recent years. In ideal situations, a synopsis should provide accurate estimates for different types of queries over a wide variety of data sets, consume a small amount of memory(More)
The rich dependency structure found in the columns of real-world relational databases can be exploited to great advantage, but can also cause query optimizers---which usually assume that columns are statistically independent---to underestimate the selectivities of conjunctive predicates by orders of magnitude. We introduce CORDS, an efficient and scalable(More)
Data on the Internet is increasingly presented in XML format. This enables novel applications that pose queries over " all the XML data on the Inter-net. " Queries over XML data use path expressions to navigate through the structure of the data, and optimizing these queries requires estimating the selectivity of these path expressions. In this paper , we(More)
Virtual machine monitors are becoming popular tools for the deployment of database management systems and other enterprise software. In this article, we consider a common resource consolidation scenario in which several database management system instances, each running in a separate virtual machine, are sharing a common pool of physical computing(More)
Synthetically generated data has always been important for evaluating and understanding new ideas in database research. In this paper, we describe a data generator for generating synthetic complex-structured XML data that allows for a high level of control over the characteristics of the generated data. This data generator is certainly not the ultimate(More)
Storage servers, as well as storage clients, typically have large memories in which they cache data blocks. This creates a two-tier cache hierarchy in which the presence of a first-tier cache (at the storage client) makes it more difficult to manage the second-tier cache (at the storage server). Many techniques have been proposed for improving the(More)
— In three-tiered web applications, some form of admission control is required to ensure that throughput and response times are not significantly harmed during periods of heavy load. We propose Q-Cop, a prototype system for improving admission control decisions that considers a combination of the load on the system, the number of simultaneous queries being(More)
A question that database administrators (DBAs) routinely need to answer is how long a batch query workload will take to complete. This question arises, for example, while planning the execution of different report-generation workloads to fit within available time windows. To answer this question accurately, we need to take into account that the typical(More)