Learn More
In this paper, we introduce self-tuning histograms. Although similar in structure to traditional histograms, these histograms infer data distributions not by examining the data or a sample thereof, but by using feedback from the query execution engine about the actual selectivity of range selection operators to progressively refine the histogram. Since the(More)
Recently, there has been a great deal of research into XML query languages to enable the execution of database-style queries over XML files. However, merely being an XML query-processing engine does not render a system suitable for querying the Internet. A useful system must provide mechanisms to (a) find the XML files that are relevant to a given query,(More)
The rich dependency structure found in the columns of real-world relational databases can be exploited to great advantage, but can also cause query optimizers---which usually assume that columns are statistically independent---to underestimate the selectivities of conjunctive predicates by orders of magnitude. We introduce CORDS, an efficient and scalable(More)
We propose XSEED, a synopsis of path queries for cardinality estimation that is accurate, robust, efficient, and adaptive to memory budgets. XSEED starts from a very small kernel, and then incrementally updates information of the synopsis. With such an incremental construction, a synopsis structure can be dynamically configured to accommodate different(More)
Data on the Internet is increasingly presented in XML format. This enables novel applications that pose queries over " all the XML data on the Inter-net. " Queries over XML data use path expressions to navigate through the structure of the data, and optimizing these queries requires estimating the selectivity of these path expressions. In this paper , we(More)
Storage servers, as well as storage clients, typically have large memories in which they cache data blocks. This creates a two-tier cache hierarchy in which the presence of a first-tier cache (at the storage client) makes it more difficult to manage the second-tier cache (at the storage server). Many techniques have been proposed for improving the(More)
Virtual machine monitors are becoming popular tools for the deployment of database management systems and other enterprise software. In this article, we consider a common resource consolidation scenario in which several database management system instances, each running in a separate virtual machine, are sharing a common pool of physical computing(More)
Synthetically generated data has always been important for evaluating and understanding new ideas in database research. In this paper, we describe a data generator for generating synthetic complex-structured XML data that allows for a high level of control over the characteristics of the generated data. This data generator is certainly not the ultimate(More)
A question that database administrators (DBAs) routinely need to answer is how long a batch query workload will take to complete. This question arises, for example, while planning the execution of different report-generation workloads to fit within available time windows. To answer this question accurately, we need to take into account that the typical(More)
Indexing large XML databases is crucial for efficient evaluation of XML twig queries. In this paper, we propose a feature-based indexing technique, called FIX, based on spectral graph theory. The basic idea is that for each twig pattern in a collection of XML documents, we calculate a vector of features based on its structural properties. These features are(More)