Gemini: Practical Reconfigurable Datacenter Networks with Topology and Traffic Engineering
@article{Zhang2021GeminiPR, title={Gemini: Practical Reconfigurable Datacenter Networks with Topology and Traffic Engineering}, author={Mingyang Zhang and Jianan Zhang and Rui Wang and Ramesh Govindan and Jeffrey C. Mogul and Amin Vahdat}, journal={ArXiv}, year={2021}, volume={abs/2110.08374} }
To reduce cost, datacenter network operators are exploring blocking network designs. An example of such a design is a "spine-free" form of a Fat-Tree, in which pods directly connect to each other, rather than via spine blocks. To maintain application-perceived performance in the face of dynamic workloads, these new designs must be able to reconfigure routing and the inter-pod topology. Gemini is a system designed to achieve these goals on commodity hardware while reconfiguring the network…
Figures from this paper
8 Citations
Dynamic Demand-Aware Link Scheduling for Reconfigurable Datacenters
- Computer ScienceArXiv
- 2023
An extensive empirical evaluation finds that dynamic algorithms can both improve the running time and reduce the number of changes to the configuration, especially in networks with high temporal locality, while retaining matching weight.
Duo: A High-Throughput Reconfigurable Datacenter Network Using Local Routing and Control
- Computer ScienceProc. ACM Meas. Anal. Comput. Syst.
- 2023
Duo is a novel demand-aware reconfigurable rack-to-rack datacenter network design realized with a simple and efficient control plane based on the well-known de Bruijn topology and it is shown that Duo provides higher throughput, shorter paths, lower flow completion times for high priority flows, and minimal packet reordering, all using existing network and transport layer protocols.
Jupiter evolving: transforming google's datacenter network via optical circuit switches and software-defined networking
- Computer ScienceSIGCOMM
- 2022
It is shown that the combination of traffic and topology engineering on direct-connect fabrics achieves similar throughput as Clos fabrics for the authors' production traffic patterns, and OCS achieves 3x faster fabric reconfiguration compared to pre-evolution ClosFabric.
Fast and Heavy Disjoint Weighted Matchings for Demand-Aware Datacenter Topologies
- Computer ScienceIEEE INFOCOM 2022 - IEEE Conference on Computer Communications
- 2022
This paper initiates the study of fast algorithms to find k disjoint heavy matchings in graphs, and presents and analyzes six algorithms, based on iterative matchings, b-matching, edge coloring, and node-rankings.
Hashing Design in Modern Networks: Challenges and Mitigation Techniques
- Computer ScienceUSENIX Annual Technical Conference
- 2022
A novel approach named color recombining is proposed which enables hash functions to reuse via leveraging topology traits of multi-stage DCN networks and a novel framework based on coprime theory to mitigate hash correlation in generic mesh topologies is described.
This paper is included in the Proceedings of the 2022 USENIX Annual Technical Conference. Hashing Design in Modern Networks: Challenges and Mitigation Techniques
- Computer Science
- 2022
A novel approach named color recombining is proposed which enables hash functions to reuse via leveraging topology traits of multi-stage DCN networks and a novel framework based on coprime theory to mitigate hash correlation in generic mesh topologies is described.
Machine-Learning-Aided Dynamic Reconfiguration in Optical DC/HPC Networks (Invited)
- Computer Science2022 International Conference on Optical Network Design and Modeling (ONDM)
- 2022
A dynamic network reconfiguration mechanism that could satisfy the time-varying applications’ demands in an optical DC/HPC network and can improve the end-to-end packet latency, and the packet loss rate, is presented.
Kevin: de Bruijn-based topology with demand-aware links and greedy routing
- Computer ScienceArXiv
- 2022
Kevin is a novel demand-aware reconfigurable rack-to-rack datacenter network realized with a simple and efficient control plane based on a de Brujin topology in which static links are enhanced with opportunistic links.
References
SHOWING 1-10 OF 48 REFERENCES
Beyond fat-trees without antennae, mirrors, and disco-balls
- Computer ScienceSIGCOMM
- 2017
The results substantially lower the barriers for improving upon today's data centers by showing that a static, cabling-friendly topology built using commodity equipment yields superior performance when combined with well-understood routing methods.
Evolving Requirements and Trends of Datacenters Networks
- Computer Science, Physics
- 2020
An overview of Google’s datacenter network is presented, which has led and defined the industry over the past few decades, and future technology directions for scaling bandwidth through a combination of higher baud rates, wavelength-division multiplexing, coherent communication, and space-divisionmultiplexing are discussed.
RotorNet: A Scalable, Low-complexity, Optical Datacenter Network
- Computer ScienceSIGCOMM
- 2017
While RotorNet dynamically reconfigures its constituent circuit switches, it decouples switch configuration from traffic patterns, obviating the need for demand collection and admitting a fully decentralized control plane.
Sirius: A Flat Datacenter Network with Nanosecond Optical Switching
- Computer Science, PhysicsSIGCOMM
- 2020
Sirius, an optically-switched network for datacenters providing the abstraction of a single, high-radix switch that can connect thousands of nodes---racks or servers---in a datacenter while achieving nanosecond-granularity reconfiguration, is proposed.
A scalable, commodity data center network architecture
- Computer ScienceSIGCOMM '08
- 2008
This paper shows how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements and argues that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today's higher-end solutions.
Designing a Predictable Internet Backbone with Valiant Load-Balancing
- Computer ScienceIWQoS
- 2005
It is shown that the same qualities of service can be achieved in a realistic heterogeneous backbone network in the sense that the capacity required by VLB is very close to the lower bound of total capacity needed by any architecture in order to support all traffic matrices.
Network architecture for joint failure recovery and traffic engineering
- Computer SciencePERV
- 2011
This paper proposes a unified way to balance load efficiently under a wide range of failure scenarios, and presents and solves the optimization problems that compute the configuration state for each router.
Condor: Better Topologies Through Declarative Design
- Computer ScienceComput. Commun. Rev.
- 2015
Condor allows architects to express their requirements as constraints via a Topology Description Language (TDL), rather than having to directly specify network structures, and uses constraint-based synthesis to rapidly generate candidate topologies, which can be analyzed against multiple criteria.
zUpdate: updating data center networks with zero loss
- Computer ScienceSIGCOMM
- 2013
This work develops novel techniques to handle several practical challenges in realizing zUpdate as well as implement the zUpdate prototype on OpenFlow switches and deploy it on a testbed that resembles real DCN topology.
Expanding across time to deliver bandwidth efficiency and low latency
- Computer ScienceNSDI
- 2020
Opera is presented, a dynamic network that delivers latency-sensitive traffic quickly by relying on multi-hop forwarding in the same way as expander-graph-based approaches, but provides near-optimal bandwidth for bulk flows through direct forwarding over time-varying source-to-destination circuits.