Ashraf Aboulnaga

Learn More
In this paper, we introduce self-tuning histograms. Although similar in structure to traditional histograms, these histograms infer data distributions not by examining the data or a sample thereof, but by using feedback from the query execution engine about the actual selectivity of range selection operators to progressively refine the histogram. Since the(More)
Data on the Internet is increasingly presented in XML format. This enables novel applications that pose queries over “all the XML data on the Internet.” Queries over XML data use path expressions to navigate through the structure of the data, and optimizing these queries requires estimating the selectivity of these path expressions. In this paper, we(More)
The rich dependency structure found in the columns of real-world relational databases can be exploited to great advantage, but can also cause query optimizers---which usually assume that columns are statistically independent---to underestimate the selectivities of conjunctive predicates by orders of magnitude. We introduce CORDS, an efficient and scalable(More)
Virtual machine monitors are becoming popular tools for the deployment of database management systems and other enterprise software. In this article, we consider a common resource consolidation scenario in which several database management system instances, each running in a separate virtual machine, are sharing a common pool of physical computing(More)
We propose XSEED, a synopsis of path queries for cardinality estimation that is accurate, robust, efficient, and adaptive to memory budgets. XSEED starts from a very small kernel, and then incrementally updates information of the synopsis. With such an incremental construction, a synopsis structure can be dynamically configured to accommodate different(More)
Storage servers, as well as storage clients, typically have large memories in which they cache data blocks. This creates a two-tier cache hierarchy in which the presence of a first-tier cache (at the storage client) makes it more difficult to manage the second-tier cache (at the storage server). Many techniques have been proposed for improving the(More)
Recently, there has been a great deal of research into XML query languages to enable the execution of database-style queries over XML files. However, merely being an XML query-processing engine does not render a system suitable for querying the Internet. A useful system must provide mechanisms to (a) find the XML files that are relevant to a given query,(More)
In this paper, we present a technique for building a high-availability (HA) database management system (DBMS). The proposed technique can be applied to any DBMS with little or no customization, and with reasonable performance overhead. Our approach is based on Remus, a commodity HA solution implemented in the virtualization layer, that uses asynchronous(More)
We present a scalable and fault-tolerant solution for the maximum clique problem based on the MapReduce framework. The key contribution that enables us to effectively use MapReduce is a recursive partitioning method that partitions the graph into several subgraphs of similar size. After partitioning, the maximum cliques of the different partitions can be(More)