The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition
@inproceedings{Barroso2013TheDA, title={The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition}, author={Luiz Andr{\'e} Barroso and Jimmy Clidaras and Urs H{\"o}lzle}, booktitle={The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition}, year={2013} }
Abstract As computation continues to move into the cloud, the computing platform of interest no longer resembles a pizza box or a refrigerator, but a warehouse full of computers. These new large datacenters are quite different from traditional hosting facilities of earlier times and cannot be viewed simply as a collection of co-located servers. Large portions of the hardware and software resources in these facilities must work in concert to efficiently deliver good levels of Internet service…
Figures and Tables from this paper
361 Citations
Profiling a warehouse-scale computer
- Computer Science2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)
- 2015
A detailed microarchitectural analysis of live datacenter jobs, measured on more than 20,000 Google machines over a three year period, and comprising thousands of different applications finds that WSC workloads are extremely diverse, breeding the need for architectures that can tolerate application variability without performance loss.
Composable architecture for rack scale big data computing
- Computer ScienceFuture Gener. Comput. Syst.
- 2017
Chariots: A Scalable Shared Log for Data Management in Multi-Datacenter Cloud Environments
- Computer ScienceEDBT
- 2015
This work proposes a novel distributed log store, called the Fractal Log Store (FLStore), that overcomes the bottleneck of a single-point of contention in shared log infrastructures, and proposes Chariots, which provides multi-datacenter replication for shared logs.
Rack-Scale Memory Pooling for Datacenters
- Computer Science
- 2017
This thesis proposes rack-scale memory pooling (RSMP), a new scaling technique for future datacenters that reduces networking overheads and improves the performance of core datacenter software, and proposes a new RSMP design that leverages integration and a NUMA fabric to bridge the gap between local and remote memory to only 5× difference in access latency.
Architecting Efficient Data Centers
- Computer Science
- 2012
The PowerNap server architecture is introduced, a coordinated full-system idle lowpower mode which transitions in and out of an ultra-low power nap state to save power during brief idle periods, and DreamWeaver, architectural support for deep sleep.
Porting LibRIPC to iWARP
- Computer Science
- 2012
A port of LibRIPC to iWARP is presented, which enables the library for its use over Ethernet, one of the most cost efficient network fabrics and provides capabilities for high performance networking since the upcoming of standards that specify data rates of 10 Gbit/s and above.
Internet-based Virtual Computing Environment: Beyond the data center as a computer
- Computer ScienceFuture Gener. Comput. Syst.
- 2013
BigHouse: A simulation infrastructure for data center systems
- Computer Science2012 IEEE International Symposium on Performance Analysis of Systems & Software
- 2012
This paper introduces BigHouse, describes its design, and presents case studies for how it has already been applied to build and validate models of data center workloads and systems, and demonstrates its scalability to model large cluster systems while maintaining reasonable simulation time.
InterFS: An Interplanted Distributed File System to Improve Storage Utilization
- Computer ScienceAPSys
- 2015
This work proposes InterFS, a POSIX-compliant distributed file system aiming at fully exploiting the storage resource on data center clusters, which can be interplanted with other resource-intensive services without interfering with them, and amply fulfill the storage requirements of small-scale applications in the data center.
GrandSLAm: Guaranteeing SLAs for Jobs in Microservices Execution Frameworks
- Computer ScienceEuroSys
- 2019
This study presents GrandSLAm, a microservice execution framework that improves utilization of datacenters hosting microservices, and significantly increases throughput by up to 3x compared to the baseline, without violating SLAs for a wide range of real-world AI and ML applications.
References
SHOWING 1-10 OF 169 REFERENCES
Power provisioning for a warehouse-sized computer
- Computer ScienceISCA '07
- 2007
This paper presents the aggregate power usage characteristics of large collections of servers for different classes of applications over a period of approximately six months, and uses the modelling framework to estimate the potential of power management schemes to reduce peak power and energy usage.
Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments
- Computer Science, Business2008 International Symposium on Computer Architecture
- 2008
A new solution that incorporates volume non-server-class components in novel packaging solutions, with memory sharing and flash-based disk caching, has promise, with a 2X improvement on average in performance-per-dollar for the benchmark suite.
Dynamo: amazon's highly available key-value store
- Computer ScienceSOSP
- 2007
D Dynamo is presented, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience and makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.
Energy-Efficient Datacenters
- Computer ScienceIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
- 2012
The goal of this paper is to provide an introduction to resource provisioning and power or thermal management problems in datacenters, and to review strategies that maximize the datacenter energy efficiency subject to peak or total power consumption and thermal constraints, while meeting stipulated service level agreements in terms of task throughput and/or response time.
A scalable, commodity data center network architecture
- Computer ScienceSIGCOMM '08
- 2008
This paper shows how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements and argues that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today's higher-end solutions.
Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
- Computer Science
- 2010
The design of Dapper is introduced, Google’s production distributed systems tracing infrastructure is described, and how its design goals of low overhead, application-level transparency, and ubiquitous deployment on a very large scale system were met are described.
Power management of online data-intensive services
- Computer Science2011 38th Annual International Symposium on Computer Architecture (ISCA)
- 2011
This work evaluates the applicability of active and idle low-power modes to reduce the power consumed by the primary server components (processor, memory, and disk), while maintaining tight response time constraints, particularly on 95th-percentile latency.
Managing server energy and operational costs in hosting centers
- BusinessSIGMETRICS '05
- 2005
This paper proposes three new online solution strategies based on steady state queuing analysis, feedback control theory, and a hybrid mechanism borrowing ideas from these two that are more adaptive to workload behavior when performing server provisioning and speed control than earlier heuristics towards minimizing operational costs while meeting the SLAs.
The case for RAMClouds: scalable high-performance storage entirely in DRAM
- Computer ScienceOPSR
- 2010
This paper argues for a new approach to datacenter storage called RAMCloud, where information is kept entirely in DRAM and large-scale systems are created by aggregating the main memories of thousands of commodity servers.
Availability in Globally Distributed Storage Systems
- Computer ScienceOSDI
- 2010
This work characterize the availability properties of cloud storage systems based on an extensive one year study of Google's main storage infrastructure and presents statistical models that enable further insight into the impact of multiple design choices, such as data placement and replication strategies.