Fat-trees: Universal networks for hardware-efficient supercomputing
@article{Leiserson1985FattreesUN, title={Fat-trees: Universal networks for hardware-efficient supercomputing}, author={Charles E. Leiserson}, journal={IEEE Transactions on Computers}, year={1985}, volume={C-34}, pages={892-901} }
The author presents a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer. A fat-tree routing network is parameterized not only in the number of processors, but also in the amount of simultaneous communication it can support. Since communication can be scaled independently from the number of processors, substantial hardware can be saved for such applications as finite-element analysis without…
1,431 Citations
Recursively scalable fat-trees as interconnection networks
- Computer ScienceProceeding of 13th IEEE Annual International Phoenix Conference on Computers and Communications
- 1994
This work proposes a new interconnection network for a massively parallel computer based on the QROOOl Data Stream Controller Interface, an integrated circuit produced by National Semiconductor, which can sustain a throughput of up to 180 MBytes/sec.
Using Fat-trees to Maximize the Number of Processors in a Massively Parallel Computer
- Computer Science
- 1993
This work investigates the problem of maximizing the number of processors in a massively parallel computer when the degree of the internal nodes and the diameter of the network are physically constrained, and describes a novel interconnection network in which each internal node of the fat-tree is a ring.
Efficient Interconnection Schemes for VLSI and Parallel Computation
- Computer Science
- 1989
This thesis shows that networks based on Leiserson's fat- tree architecture are nearly as good as any network built in a comparable amount of physical space.
The fat-stack and universal routing in interconnection networks
- Computer ScienceJ. Parallel Distributed Comput.
- 2004
Universal routing in distributed networks
- Computer Science11th International Conference on Parallel and Distributed Systems (ICPADS'05)
- 2005
The universality proof shows that a fat-stack of area /spl Theta/(A) can simulate any competing network of area A with O(log/sup 3/2/ A) overhead independently of wire delay and implies that the fat- stack of a given size is nearly the best routing network of that size.
A Mesh-of-Trees Interconnection Network for Single-Chip Parallel Processing
- Computer ScienceIEEE 17th International Conference on Application-specific Systems, Architectures and Processors (ASAP'06)
- 2006
It is shown that on-chip interconnection networks can provide higher bandwidth between processors and shared first-level cache than previously considered possible, facilitating greater scalability of memory architectures that require that.
Reducing complexity in tree-like computer interconnection networks
- Computer ScienceParallel Comput.
- 2010
NAP(No ALU Processor): The Great Communicator
- Computer ScienceJ. Parallel Distributed Comput.
- 1990
Randomized routing on fat-tress
- Computer Science26th Annual Symposium on Foundations of Computer Science (sfcs 1985)
- 1985
In a VLSI-like model where hardware cost is equated with physical volume, the routing algorithm is used to demonstrate that fat-trees are universal routing networks in the sense that any routing network can be efficiently simulated by a fat-tree of comparable hardware cost.
Fat-tree for local area multiprocessors
- Computer ScienceProceedings of 9th International Parallel Processing Symposium
- 1995
This paper examines the use of a fat-tree topology for this (possibly distributed) switch and results are presented to show the latency, throughput, buffer requirements, and the effect of cable length.
References
SHOWING 1-10 OF 38 REFERENCES
Randomized routing on fat-tress
- Computer Science26th Annual Symposium on Foundations of Computer Science (sfcs 1985)
- 1985
In a VLSI-like model where hardware cost is equated with physical volume, the routing algorithm is used to demonstrate that fat-trees are universal routing networks in the sense that any routing network can be efficiently simulated by a fat-tree of comparable hardware cost.
How to assemble tree machines
- Computer Science
- 1984
The authors give a linear-area chip of m processors and only four off-chip connections which can be used as the sole building block to construct an arbitrarily large complete binary tree.
The cube-connected cycles: a versatile network for parallel computation
- Computer ScienceCACM
- 1981
This work describes in detail how to program the cube-connected cycles for efficiently solving a large class of problems that include Fast Fourier transform, sorting, permutations, and derived algorithms.
A Scheme for Fast Parallel Communication
- Computer ScienceSIAM J. Comput.
- 1982
There is a distributed randomized algorithm that can route every packet to its destination without two packets passing down the same wire at any one time, and finishes within time $O(\log N)$ with overwhelming probability for all such routing requests.
Area-Efficient VLSI Computation
- Computer Science
- 1983
The two parts of this thesis address the contribution of communication to the performance and area of an integrated circuit, and provide mathematical views of an engineering discipline: techniques of theoretical computer science--e.g., divide and conquer, automata theory, asymptotic analysis--applied to integrated circuit computation.
Global wire routing in two-dimensional arrays
- Computer Science, Mathematics24th Annual Symposium on Foundations of Computer Science (sfcs 1983)
- 1983
A central result of this paper is a “rounding algorithm” for obtaining integral approximations to solutions of linear equations for matrix A and real vector x.
Area-Efficient Graph Layouts (for VLSI).
- Computer ScienceFOCS 1980
- 1980
An algorithm is given that produces VLSI layouts for classes of graphs that have good separator theorems and shows in particular that any planar graph of n vertices has an O(n lg-square(n) area layout and that any tree of n Vertices can be laid out in linear area.
Universal schemes for parallel communication
- Computer ScienceSTOC '81
- 1981
This paper shows that there exists an N-processor computer that can simulate arbitrary N- processor parallel computations with only a factor of O(log N) loss of runtime efficiency, and isolates a combinatorial problem that lies at the heart of this question.
Polymorphic Arrays: A Novel VLSI Layout for Systolic Computers
- Computer ScienceFOCS
- 1984
This paper proposes a novel architecture for massively parallel systolic computers, which is based on results from lattice theory, and guarantees exceptional load uniformity for rectangular process arrays of arbitrary sizes.
The Complexity Theory of Switching Networks.
- Computer Science
- 1973
Abstract : The author considers switching networks of the type used for line switching in communication networks or for reconfiguration of modular computer systems, and examines the complexity…