The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition

@inproceedings{Barroso2013TheDA,
  title={The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition},
  author={Luiz Andr{\'e} Barroso and Jimmy Clidaras and Urs H{\"o}lzle},
  booktitle={The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition},
  year={2013}
}
Abstract As computation continues to move into the cloud, the computing platform of interest no longer resembles a pizza box or a refrigerator, but a warehouse full of computers. These new large datacenters are quite different from traditional hosting facilities of earlier times and cannot be viewed simply as a collection of co-located servers. Large portions of the hardware and software resources in these facilities must work in concert to efficiently deliver good levels of Internet service… 
Profiling a warehouse-scale computer
TLDR
A detailed microarchitectural analysis of live datacenter jobs, measured on more than 20,000 Google machines over a three year period, and comprising thousands of different applications finds that WSC workloads are extremely diverse, breeding the need for architectures that can tolerate application variability without performance loss.
Chariots: A Scalable Shared Log for Data Management in Multi-Datacenter Cloud Environments
TLDR
This work proposes a novel distributed log store, called the Fractal Log Store (FLStore), that overcomes the bottleneck of a single-point of contention in shared log infrastructures, and proposes Chariots, which provides multi-datacenter replication for shared logs.
Rack-Scale Memory Pooling for Datacenters
TLDR
This thesis proposes rack-scale memory pooling (RSMP), a new scaling technique for future datacenters that reduces networking overheads and improves the performance of core datacenter software, and proposes a new RSMP design that leverages integration and a NUMA fabric to bridge the gap between local and remote memory to only 5× difference in access latency.
Architecting Efficient Data Centers
TLDR
The PowerNap server architecture is introduced, a coordinated full-system idle lowpower mode which transitions in and out of an ultra-low power nap state to save power during brief idle periods, and DreamWeaver, architectural support for deep sleep.
Porting LibRIPC to iWARP
TLDR
A port of LibRIPC to iWARP is presented, which enables the library for its use over Ethernet, one of the most cost efficient network fabrics and provides capabilities for high performance networking since the upcoming of standards that specify data rates of 10 Gbit/s and above.
BigHouse: A simulation infrastructure for data center systems
TLDR
This paper introduces BigHouse, describes its design, and presents case studies for how it has already been applied to build and validate models of data center workloads and systems, and demonstrates its scalability to model large cluster systems while maintaining reasonable simulation time.
InterFS: An Interplanted Distributed File System to Improve Storage Utilization
TLDR
This work proposes InterFS, a POSIX-compliant distributed file system aiming at fully exploiting the storage resource on data center clusters, which can be interplanted with other resource-intensive services without interfering with them, and amply fulfill the storage requirements of small-scale applications in the data center.
GrandSLAm: Guaranteeing SLAs for Jobs in Microservices Execution Frameworks
TLDR
This study presents GrandSLAm, a microservice execution framework that improves utilization of datacenters hosting microservices, and significantly increases throughput by up to 3x compared to the baseline, without violating SLAs for a wide range of real-world AI and ML applications.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 169 REFERENCES
Power provisioning for a warehouse-sized computer
TLDR
This paper presents the aggregate power usage characteristics of large collections of servers for different classes of applications over a period of approximately six months, and uses the modelling framework to estimate the potential of power management schemes to reduce peak power and energy usage.
Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments
TLDR
A new solution that incorporates volume non-server-class components in novel packaging solutions, with memory sharing and flash-based disk caching, has promise, with a 2X improvement on average in performance-per-dollar for the benchmark suite.
Dynamo: amazon's highly available key-value store
TLDR
D Dynamo is presented, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience and makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.
Energy-Efficient Datacenters
  • Massoud Pedram
  • Computer Science
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
  • 2012
TLDR
The goal of this paper is to provide an introduction to resource provisioning and power or thermal management problems in datacenters, and to review strategies that maximize the datacenter energy efficiency subject to peak or total power consumption and thermal constraints, while meeting stipulated service level agreements in terms of task throughput and/or response time.
A scalable, commodity data center network architecture
TLDR
This paper shows how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements and argues that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today's higher-end solutions.
Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
TLDR
The design of Dapper is introduced, Google’s production distributed systems tracing infrastructure is described, and how its design goals of low overhead, application-level transparency, and ubiquitous deployment on a very large scale system were met are described.
Power management of online data-intensive services
TLDR
This work evaluates the applicability of active and idle low-power modes to reduce the power consumed by the primary server components (processor, memory, and disk), while maintaining tight response time constraints, particularly on 95th-percentile latency.
Managing server energy and operational costs in hosting centers
TLDR
This paper proposes three new online solution strategies based on steady state queuing analysis, feedback control theory, and a hybrid mechanism borrowing ideas from these two that are more adaptive to workload behavior when performing server provisioning and speed control than earlier heuristics towards minimizing operational costs while meeting the SLAs.
The case for RAMClouds: scalable high-performance storage entirely in DRAM
TLDR
This paper argues for a new approach to datacenter storage called RAMCloud, where information is kept entirely in DRAM and large-scale systems are created by aggregating the main memories of thousands of commodity servers.
Availability in Globally Distributed Storage Systems
TLDR
This work characterize the availability properties of cloud storage systems based on an extensive one year study of Google's main storage infrastructure and presents statistical models that enable further insight into the impact of multiple design choices, such as data placement and replication strategies.
...
1
2
3
4
5
...