The network architecture of the Connection Machine CM-5 (extended abstract)

@inproceedings{Leiserson1992TheNA,
  title={The network architecture of the Connection Machine CM-5 (extended abstract)},
  author={Charles E. Leiserson and Zahi S. Abuhamdeh and David C. Douglas and Carl R. Feynman and Mahesh N. Ganmukhi and Jeffrey V. Hill and W. Daniel Hillis and Bradley C. Kuszmaul and Margaret A. St. Pierre and David S. Wells and Monica C. Wong and Shaw-Wen Yang and Robert C. Zak},
  booktitle={SPAA '92},
  year={1992}
}
The Connection Machine Model CM-5 Supercomputer is a massively parallel computer system designed to offer performance in the range of 1 teraflops (1012 floating-point operations per second). The CM-5 obtains its high performance while offering ease of programming, flexibility, and reliability. The machine contains three communication networks: a data network, a control network, and a diagnostic network. This paper describes the organization of these three networks and how they contribute to the… 

Figures from this paper

The CM-5 Connection Machine: a scalable supercomputer
TLDR
The CM-5 Connection Machine is a scalable homogeneous multiprocessor designed for large-scale scientific and business applications and it is believed that architectures of this type will replace most other forms of supercomputing in the foreseeable future.
Communication and computation performance of the CM-5
TLDR
To assess the scalability of the CM-5's computation and interprocessor communication rates, a series of benchmarks was used to measure the performance of theCM-5 data and control networks, the node vector units, and the balance of computation and communication.
Network performance under physical constraints
  • F. Petrini, M. Vanneschi
  • Computer Science
    Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)
  • 1997
TLDR
This paper compares the communication performance of fat-trees, and low dimensional cubes, in the design of interconnection networks for massively parallel computers using a detailed simulation model.
Toward high communication performance through compiled communications on a circuit switched interconnection network
  • F. Cappello, C. Germain
  • Computer Science
    Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture
  • 1995
TLDR
A new principle of interconnection network for massively parallel architectures in the field of numerical computation combining very high bandwidth, very low latency, performance independence to communication pattern or network load and a performance improvement proportional to the hardware performance improvement is discussed.
Adapting the Network Interface for High-Performance Computing: The CNI Approach
TLDR
This paper presents the CNI orcluster network interface that achieves the twin goals of low latency and high bandwidth, and efficiently supports multiple programming paradigms for programming generality.
Static Communications in Parallel Scientific Propgrams
TLDR
This paper presents experimental analysis of communications in parallel scientific programs, showing that most communication patterns of application programs are determined at compile-time, and sketches an execution model intended to exploit this knowledge.
The SP2 High-Performance Switch
TLDR
The switch architecture is examined and an overview of its support software is presented, which uses a variety of techniques to improve bandwidth and offload communication tasks from the node processor.
PowerMANNA: a parallel architecture based on the PowerPC MPC620
  • P. Behr, S. Pletner, A. Sodan
  • Computer Science
    Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550)
  • 2000
TLDR
The paper presents PowerMANNA, a distributed-memory parallel computer system based on the 64-Bit PowerPC processor MPC620 that incorporates important architectural concepts that allow it to exploit the performance of modern superscale microprocessor in the context of massively parallel supercomputing.
A tightly-coupled processor-network interface
TLDR
The interface architecture reduces communication overhead five fold in the authors' benchmarks and most of the performance gain comes from simple, low cost hardware mechanisms for fast dispatching on, forwarding of, and replying to messages.
Multicast virtual topologies for collective communication in MPCs and ATM clusters
TLDR
The paper describes the practical issues of using these methods in wormhole-routed massively parallel computers (MPCs) and in workstation clusters connected by Asynchronous Transfer Mode (ATM) networks.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 34 REFERENCES
Functional VLSI design verification methodology for the CM-5 massively parallel supercomputer
The methodology and techniques developed from the functional verification of five of the VLSI chips used in the CM-5, a massively parallel supercomputer, are described. The verification methodology
The cosmic cube
TLDR
This “Cosmic Cube” computer is a hardware simulation of a future VLSI implementation that will consist of single-chip nodes and offers high degrees of concurrency in applications and suggests that future machines with thousands of nodes are both feasible and attractive.
Fat-trees: Universal networks for hardware-efficient supercomputing
  • C. Leiserson
  • Computer Science
    IEEE Transactions on Computers
  • 1985
TLDR
The author presents a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer, and proves that a fat-tree of a given size is nearly the best routing network of that size.
Very high-speed computing systems
TLDR
The constituents of a system: storage, execution, and instruction handling (branching) are discussed with regard to recent developments and/or systems limitations.
Scalable shared-memory multiprocessor architectures
TLDR
Directory-based and bus-based cache coherence schemes are defined and described, and schemes using presence flags, B pointers, and linked lists are discussed.
Principles and lessons in packet communications
TLDR
The need for efficient resource sharing is identified and the original and recurring difficulties the authors had in achieving this goal in packet networks are reviewed.
Vector Models for Data-Parallel Computing
TLDR
A model of parallelism that extends and formalizes the Data-Parallel model on which the Connection Machine and other supercomputers are based is described, and it is argued that data-parallel models are not only practical and can be applied to a surprisingly wide variety of problems, they are also well suited for very-high-level languages and lead to a concise and clear description of algorithms and their complexity.
An IEEE 1149.1 compliant testability architecture with internal scan
  • R. Zak, Jeffrey V. Hill
  • Computer Science
    Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computers & Processors
  • 1992
TLDR
A testability architecture for VLSI devices which is IEEE 1149.1 compliant and includes extensions for partitionable internal scan chains is described, resulting in designs that are amenable to static timing analysis and are extensible to at-speed built-in-self-test (BIST).
...
1
2
3
4
...