Learn More
Companies providing cloud-scale data services have increasing needs to store and analyze massive data sets, such as search logs, click streams, and web graph data. For cost and performance reasons, processing is typically done on large clusters of tens of thousands of commodity machines. Such massive data analysis on large clusters presents new(More)
– Performant execution of data-parallel jobs needs good execution plans. Certain properties of the code, the data, and the interaction between them are crucial to generate these plans. Yet, these properties are dif-ï¿¿cult to estimate due to the highly distributed nature of these frameworks, the freedom that allows users to specify arbitrary code as(More)
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to(More)
This paper presents an architecture overview of the distributed, heterogeneous query processor (DHQP) in the Microsoft SQL Server database system to enable queries over a large collection of diverse data sources. The paper highlights three salient aspects of the architecture. First, the system introduces well-defined abstractions such as connections,(More)
Companies providing cloud-scale data services have increasing needs to store and analyze massive data sets (e.g., search logs, click streams, and web graph data). For cost and performance reasons, processing is typically done on large clusters of thousands of commodity machines by using high level scripting languages. In the recent past, there has been(More)