• Corpus ID: 239016368

Gemini: Practical Reconfigurable Datacenter Networks with Topology and Traffic Engineering

@article{Zhang2021GeminiPR,
  title={Gemini: Practical Reconfigurable Datacenter Networks with Topology and Traffic Engineering},
  author={Mingyang Zhang and Jianan Zhang and Rui Wang and Ramesh Govindan and Jeffrey C. Mogul and Amin Vahdat},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.08374}
}
To reduce cost, datacenter network operators are exploring blocking network designs. An example of such a design is a "spine-free" form of a Fat-Tree, in which pods directly connect to each other, rather than via spine blocks. To maintain application-perceived performance in the face of dynamic workloads, these new designs must be able to reconfigure routing and the inter-pod topology. Gemini is a system designed to achieve these goals on commodity hardware while reconfiguring the network… 

Dynamic Demand-Aware Link Scheduling for Reconfigurable Datacenters

An extensive empirical evaluation finds that dynamic algorithms can both improve the running time and reduce the number of changes to the configuration, especially in networks with high temporal locality, while retaining matching weight.

Duo: A High-Throughput Reconfigurable Datacenter Network Using Local Routing and Control

Duo is a novel demand-aware reconfigurable rack-to-rack datacenter network design realized with a simple and efficient control plane based on the well-known de Bruijn topology and it is shown that Duo provides higher throughput, shorter paths, lower flow completion times for high priority flows, and minimal packet reordering, all using existing network and transport layer protocols.

Jupiter evolving: transforming google's datacenter network via optical circuit switches and software-defined networking

It is shown that the combination of traffic and topology engineering on direct-connect fabrics achieves similar throughput as Clos fabrics for the authors' production traffic patterns, and OCS achieves 3x faster fabric reconfiguration compared to pre-evolution ClosFabric.

Fast and Heavy Disjoint Weighted Matchings for Demand-Aware Datacenter Topologies

This paper initiates the study of fast algorithms to find k disjoint heavy matchings in graphs, and presents and analyzes six algorithms, based on iterative matchings, b-matching, edge coloring, and node-rankings.

Hashing Design in Modern Networks: Challenges and Mitigation Techniques

A novel approach named color recombining is proposed which enables hash functions to reuse via leveraging topology traits of multi-stage DCN networks and a novel framework based on coprime theory to mitigate hash correlation in generic mesh topologies is described.

This paper is included in the Proceedings of the 2022 USENIX Annual Technical Conference. Hashing Design in Modern Networks: Challenges and Mitigation Techniques

A novel approach named color recombining is proposed which enables hash functions to reuse via leveraging topology traits of multi-stage DCN networks and a novel framework based on coprime theory to mitigate hash correlation in generic mesh topologies is described.

Machine-Learning-Aided Dynamic Reconfiguration in Optical DC/HPC Networks (Invited)

A dynamic network reconfiguration mechanism that could satisfy the time-varying applications’ demands in an optical DC/HPC network and can improve the end-to-end packet latency, and the packet loss rate, is presented.

Kevin: de Bruijn-based topology with demand-aware links and greedy routing

Kevin is a novel demand-aware reconfigurable rack-to-rack datacenter network realized with a simple and efficient control plane based on a de Brujin topology in which static links are enhanced with opportunistic links.

References

SHOWING 1-10 OF 48 REFERENCES

Beyond fat-trees without antennae, mirrors, and disco-balls

The results substantially lower the barriers for improving upon today's data centers by showing that a static, cabling-friendly topology built using commodity equipment yields superior performance when combined with well-understood routing methods.

Evolving Requirements and Trends of Datacenters Networks

An overview of Google’s datacenter network is presented, which has led and defined the industry over the past few decades, and future technology directions for scaling bandwidth through a combination of higher baud rates, wavelength-division multiplexing, coherent communication, and space-divisionmultiplexing are discussed.

RotorNet: A Scalable, Low-complexity, Optical Datacenter Network

While RotorNet dynamically reconfigures its constituent circuit switches, it decouples switch configuration from traffic patterns, obviating the need for demand collection and admitting a fully decentralized control plane.

Sirius: A Flat Datacenter Network with Nanosecond Optical Switching

Sirius, an optically-switched network for datacenters providing the abstraction of a single, high-radix switch that can connect thousands of nodes---racks or servers---in a datacenter while achieving nanosecond-granularity reconfiguration, is proposed.

A scalable, commodity data center network architecture

This paper shows how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements and argues that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today's higher-end solutions.

Designing a Predictable Internet Backbone with Valiant Load-Balancing

It is shown that the same qualities of service can be achieved in a realistic heterogeneous backbone network in the sense that the capacity required by VLB is very close to the lower bound of total capacity needed by any architecture in order to support all traffic matrices.

Network architecture for joint failure recovery and traffic engineering

This paper proposes a unified way to balance load efficiently under a wide range of failure scenarios, and presents and solves the optimization problems that compute the configuration state for each router.

Condor: Better Topologies Through Declarative Design

Condor allows architects to express their requirements as constraints via a Topology Description Language (TDL), rather than having to directly specify network structures, and uses constraint-based synthesis to rapidly generate candidate topologies, which can be analyzed against multiple criteria.

zUpdate: updating data center networks with zero loss

This work develops novel techniques to handle several practical challenges in realizing zUpdate as well as implement the zUpdate prototype on OpenFlow switches and deploy it on a testbed that resembles real DCN topology.

Expanding across time to deliver bandwidth efficiency and low latency

Opera is presented, a dynamic network that delivers latency-sensitive traffic quickly by relying on multi-hop forwarding in the same way as expander-graph-based approaches, but provides near-optimal bandwidth for bulk flows through direct forwarding over time-varying source-to-destination circuits.