Cray Cascade: A scalable HPC system based on a Dragonfly network

@article{Faanes2012CrayCA,
  title={Cray Cascade: A scalable HPC system based on a Dragonfly network},
  author={Greg Faanes and A. Bataineh and D. Roweth and T. Court and E. Froese and Robert Alverson and Tim Johnson and Joe Kopnick and Mike Higgins and James Reinhard},
  journal={2012 International Conference for High Performance Computing, Networking, Storage and Analysis},
  year={2012},
  pages={1-9}
}
Higher global bandwidth requirement for many applications and lower network cost have motivated the use of the Dragonfly network topology for high performance computing systems. [...] Key Method We describe a set of advanced features supporting both mainstream high performance computing applications and emerging global address space programing models. We present a combination of performance results from prototype systems and simulation data for large systems. We demonstrate the value of the Dragonfly topology…Expand
Evaluating System Parameters on a Dragonfly using Simulation and Visualization
TLDR
The dragon y topology is becoming a popular choice for build- ing high-radix, low-diameter networks with high-bandwidth links and the impact of various system parameters on network throughput is studied to better understand inter-job interference. Expand
Maximizing Throughput on a Dragonfly Network
TLDR
This paper aims at analyzing the behavior of a machine built using a dragonfly network for various routing strategies, job placement policies, and application communication patterns based on a novel model that predicts traffic on individual links for direct, indirect, and adaptive routing strategies. Expand
Evaluating HPC Networks via Simulation of Parallel Workloads
TLDR
An evaluation and comparison of three topologies that are popular for building interconnection networks in large-scale supercomputers: torus, fat-tree, and dragonfly is presented and it is shown that different topologies are superior in different scenarios. Expand
Analyzing Inter-Job Contention in Dragonfly Networks
Interconnection networks are increasing in importance as node counts increase in high-end machines. To achieve better application performance, newer supercomputers frequently have interconnects withExpand
Performance Measurements of the NERSC Cray Cascade System
Cray began delivery of their next generation XC30 supercomputer systems in late 2012. One of the first systems, “Edison,” was delivered to NERSC and in this paper we present preliminary performanceExpand
Task Mapping on Complex Computer Network Topologies for Improved Performance ∗
The increase in flop/s capacity and memory bandwidth on-node at a higher rate than the inter-node link bandwidth is making many large-scale parallel applications communication-bound, meaning theirExpand
Analyzing Network Health and Congestion in Dragonfly-Based Supercomputers
TLDR
A functional network simulator, Damselfly, is developed to model the network behavior of Cray Cascade and a visual analytics tool, DragonView, is used to analyze the simulation output to develop a better understanding of inter-job interference. Expand
Scalable Interconnection Network Models for Rapid Performance Prediction of HPC Applications
  • Kishwar Ahmed, Jason Liu, S. Eidenbenz, Joe Zerr
  • Computer Science
  • 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)
  • 2016
TLDR
Three interconnect models based on interconnect topologies widely used in HPC systems: torus, dragonfly, and fat-tree are presented, based on which one can accurately predict the parallel behavior of large-scale applications. Expand
Unveiling the Interplay Between Global Link Arrangements and Network Management Algorithms on Dragonfly Networks
TLDR
A packet-level simulation framework is introduced to model the performance of HPC applications in detail and investigates the coupling between global link bandwidth and arrangements, communication pattern and intensity, job allocation and task mapping algorithms, and routing mechanisms in dragonfly topologies. Expand
Interconnection Network: Design Space Exploration of Network for Supercomputers
TLDR
The obtained results show that the hybrid network can have shorter latency for local point-to-point communication than the full fat tree network while both the networks can have comparable performance for collective communications. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 14 REFERENCES
SeaStar Interconnect: Balanced Bandwidth for Scalable Performance
TLDR
The SeaStar was designed specifically to support Sandia National Laboratories' ASC Red Storm, a distributed-memory parallel computing platform containing more than 11,000 network end-points and presented designers with several challenging goals that were commensurate with a high-performance network for a system of that scale. Expand
Technology-Driven, Highly-Scalable Dragonfly Topology
TLDR
The dragonfly topology is introduced which uses a group of high-radix routers as a virtual router to increase the effective radix of the network and the use of selective virtual-channel discrimination and theUse of credit round-trip latency to both sense and signal channel congestion gives throughput and latency that approaches that of an ideal adaptive routing algorithm. Expand
A uGNI-Based MPICH2 Nemesis Network Module for the Cray XE
TLDR
The design of a uGNI Netmod for the MPICH2 nemesis subsystem is described and performance data on the Cray XE are presented. Expand
The Cray BlackWidow: a highly scalable vector multiprocessor
TLDR
The BlackWidow system is a distributed shared memory architecture that is scalable to 32K processors, each with a 4-way dispatch scalar execution unit and an 8-pipe vector unit capable of 20.8 Gflops. Expand
The BlackWidow High-Radix Clos Network
TLDR
The radix-64 folded-Clos network of the Cray BlackWidow scalable vector multiprocessor is described, which scales to 32Kprocessors with a worst-case diameter of seven hops, and the underlying high-radix router micro architecture and its implementation. Expand
Fat-trees: Universal networks for hardware-efficient supercomputing
  • C. Leiserson
  • Computer Science
  • IEEE Transactions on Computers
  • 1985
TLDR
The author presents a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer, and proves that a fat-tree of a given size is nearly the best routing network of that size. Expand
The Gemini System Interconnect
The Gemini System Interconnect is a new network for Cray’s supercomputer systems. It provides improved network functionality, latency and issue rate. Latency is reduced with OS bypass for sends andExpand
The IBM Blue Gene/Q Interconnection Fabric
TLDR
This article describes the IBM Blue Gene/Q interconnection network and message unit, which has new routing algorithms and techniques to parallelize the injection and reception of packets in the network interface. Expand
The PERCS High-Performance Interconnect
TLDR
The Blue Waters System, which is being constructed at NCSA, is an exemplar large-scale PERCS installation that is expected to deliver sustained Pet scale performance over a wide range of applications. Expand
A Scheme for Fast Parallel Communication
TLDR
There is a distributed randomized algorithm that can route every packet to its destination without two packets passing down the same wire at any one time, and finishes within time $O(\log N)$ with overwhelming probability for all such routing requests. Expand
...
1
2
...