• Corpus ID: 73725105

Understanding Lifecycle Management Complexity of Datacenter Topologies

@inproceedings{Zhang2019UnderstandingLM,
  title={Understanding Lifecycle Management Complexity of Datacenter Topologies},
  author={Mingyang Zhang and Radhika Niranjan Mysore and Sucha Supittayapornpong and Ramesh Govindan},
  booktitle={NSDI},
  year={2019}
}
Most recent datacenter topology designs have focused on performance properties such as latency and throughput. In this paper, we explore a new dimension, life cycle management complexity, which attempts to understand the complexity of deploying a topology and expanding it. By analyzing current practice in lifecycle management, we devise complexity metrics for lifecycle management, and show that existing topology classes have low lifecycle management complexity by some measures, but not by… 
A throughput-centric view of the performance of datacenter topologies
TLDR
It is shown that using throughput to evaluatedatacenter performance instead of bisection bandwidth can alter conclusions in prior work about datacenter cost, manageability, and reliability.
Gemini: Practical Reconfigurable Datacenter Networks with Topology and Traffic Engineering
TLDR
Gemina’s use of multi-traffic-matrix optimization and hedging avoids the need for frequent topology reconfiguration, with only marginal increases in path length.
Towards highly available clos-based WAN routers
TLDR
This work explores the design of novel wiring and more sophisticated routing techniques to increase failure resilience in WAN routers, and describes techniques to optimize trunk wiring to increase effective internal router capacity so as to be resilient to internal failures.
Spineless Data Centers
TLDR
This work designs and prototype an efficient routing scheme for flat networks that uses entirely standard hardware and protocols and opens new research directions in topology and routing design that can have significant impact for the most common data centers.
Disaggregating and Consolidating Network Functionalities with SuperNIC
TLDR
This work proposes a network resource pool that consists of a new hardware-based network device called SuperNIC that consolidates network functionalities from multiple endpoints by fairly sharing limited hardware resources, and it achieves its performance goals by an auto-scaled, highly parallel data plane and a scalable control plane.
Performance Evaluation of Data Center Network Topologies via NS-2 Simulations
TLDR
The simulation study shows that for less than 20 servers, the Dcell topology has smaller latency and higher throughput compared to the other topologies, while the Facebook fat tree topology performs better when the number of servers in the data center is large.
Diamond-Miner: Comprehensive Discovery of the Internet's Topology Diamonds
TLDR
D-Miner is introduced, a system that marries previous work on high-speed probing with multipath discovery to make Internet-wide topology mapping, inclusive of load-balanced paths, feasible and help facilitate better understanding of the Internet’s true structure and resilience.
Sustainability-aware Resource Provisioning in Data Centers
TLDR
A two-phase sustainability-aware resource allocation and management framework for data center life-cycle management that jointly optimizes the data center manufacturing phase and operational phase impact without impacting the performance and service quality for the jobs is proposed.
Sundial: Fault-tolerant Clock Synchronization for Datacenters
TLDR
Sundial is presented, a fault-tolerant clock synchronization system for datacenters that achieves ∼100ns time-uncertainty bound under various types of failures, which is more than two orders of magnitude lower than the state-of-the-art solutions.
COUDER: Robust Topology Engineering for Optical Circuit Switched Data Center Networks
TLDR
COUDER is proposed, a robust topology and routing optimization framework for reconfigurable optical circuit switched data centers that achieves about 20% higher throughput, and about 32% lower average hop count compared to cost-equivalent static topologies.
...
1
2
...

References

SHOWING 1-10 OF 37 REFERENCES
A cost comparison of datacenter network architectures
TLDR
High-level models of different classes of data center networks are used and compared on cost using both current and predicted trends in cost and power consumption to understand the tradeoffs between different network architectures.
Jupiter rising: a decade of clos topologies and centralized control in Google's datacenter network
TLDR
This paper built a centralized control mechanism based on a global configuration pushed to all datacenter switches, and modular hardware design coupled with simple, robust software allowed the design to also support inter-cluster and wide-area networks.
Xpander: Towards Optimal-Performance Datacenters
TLDR
It is shown that the benefits of state-of-the-art proposals are derived from the fact that they are (implicitly) utilizing "expander graphs" (aka expanders) as their network topologies, thus unveiling a unifying theme of these proposals.
Slim Fly: A Cost Effective Low-Diameter Network Topology
  • Maciej Besta, T. Hoefler
  • Computer Science
    SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
  • 2014
TLDR
This work proposes deadlock-free routing schemes and physical layouts for large computing centres as well as a detailed cost and power model for Slim Fly, a high-performance cost-effective network topology that approaches the theoretically optimal network diameter.
VL2: a scalable and flexible data center network
TLDR
VL2 is a practical network architecture that scales to support huge data centers with uniform high capacity between servers, performance isolation between services, and Ethernet layer-2 semantics, and is built on a working prototype.
Condor: Better Topologies Through Declarative Design
TLDR
Condor allows architects to express their requirements as constraints via a Topology Description Language (TDL), rather than having to directly specify network structures, and uses constraint-based synthesis to rapidly generate candidate topologies, which can be analyzed against multiple criteria.
F10: A Fault-Tolerant Engineered Network
TLDR
This work creates an engineered network and routing protocol that can almost instantaneously reestablish connectivity and load balance, even in the presence of multiple failures, and shows that following network link and switch failures, F10 has less than 1/7th the packet loss of current schemes.
Evolve or Die: High-Availability Design Principles Drawn from Googles Network Infrastructure
TLDR
From a detailed analysis of over 100 high-impact failure events in a global-scale content provider encompassing several data centers and two WANs, it is found that failures are evenly distributed across different network types and planes, but that a large number of failures happen when a management operation is in progress within the network.
A scalable, commodity data center network architecture
TLDR
This paper shows how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements and argues that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today's higher-end solutions.
REWIRE: An optimization-based framework for unstructured data center network design
TLDR
A data center network design framework, that is called REWIRE, to design networks using an optimization algorithm, which significantly outperforms previous solutions and has up to 100-500% more bisection bandwidth and less end-to-end network latency than equivalent-cost DCNs built with best practices.
...
1
2
3
4
...