Learn More
– Performant execution of data-parallel jobs needs good execution plans. Certain properties of the code, the data, and the interaction between them are crucial to generate these plans. Yet, these properties are dif-ï¿¿cult to estimate due to the highly distributed nature of these frameworks, the freedom that allows users to specify arbitrary code as(More)
Companies providing cloud-scale data services have increasing needs to store and analyze massive data sets (e.g., search logs, click streams, and web graph data). For cost and performance reasons, processing is typically done on large clusters of thousands of commodity machines by using high level scripting languages. In the recent past, there has been(More)
Companies providing cloud-scale data services have increasing needs to store and analyze massive data sets, such as search logs, click streams, and web graph data. For cost and performance reasons, processing is typically done on large clusters of tens of thousands of commodity machines. Such massive data analysis on large clusters presents new(More)
Bitmaps are popular indexes for data warehouse (DW) applications and most database management systems offer them today. This paper proposes query optimization strategies for selections using bitmaps. Both <italic>continuous</italic> and <italic>discrete</italic> selection criteria are considered. Query optimization strategies are categorized into static and(More)
This paper presents an architecture overview of the distributed, heterogeneous query processor (DHQP) in the Microsoft SQL server database system to enable queries over a large collection of diverse data sources. The paper highlights three salient aspects of the architecture. First, the system introduces well-defined abstractions such as connections,(More)