Fat-trees: Universal networks for hardware-efficient supercomputing

@article{Leiserson1985FattreesUN,
  title={Fat-trees: Universal networks for hardware-efficient supercomputing},
  author={Charles E. Leiserson},
  journal={IEEE Transactions on Computers},
  year={1985},
  volume={C-34},
  pages={892-901}
}
  • C. Leiserson
  • Published 1 October 1985
  • Computer Science
  • IEEE Transactions on Computers
The author presents a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer. A fat-tree routing network is parameterized not only in the number of processors, but also in the amount of simultaneous communication it can support. Since communication can be scaled independently from the number of processors, substantial hardware can be saved for such applications as finite-element analysis without… 

Figures from this paper

Recursively scalable fat-trees as interconnection networks
TLDR
This work proposes a new interconnection network for a massively parallel computer based on the QROOOl Data Stream Controller Interface, an integrated circuit produced by National Semiconductor, which can sustain a throughput of up to 180 MBytes/sec.
Using Fat-trees to Maximize the Number of Processors in a Massively Parallel Computer
TLDR
This work investigates the problem of maximizing the number of processors in a massively parallel computer when the degree of the internal nodes and the diameter of the network are physically constrained, and describes a novel interconnection network in which each internal node of the fat-tree is a ring.
Efficient Interconnection Schemes for VLSI and Parallel Computation
TLDR
This thesis shows that networks based on Leiserson's fat- tree architecture are nearly as good as any network built in a comparable amount of physical space.
The fat-stack and universal routing in interconnection networks
  • K. Chen, E. Sha
  • Computer Science
    J. Parallel Distributed Comput.
  • 2004
Universal routing in distributed networks
  • K. Chen, E. Sha, Bin Xiao
  • Computer Science
    11th International Conference on Parallel and Distributed Systems (ICPADS'05)
  • 2005
TLDR
The universality proof shows that a fat-stack of area /spl Theta/(A) can simulate any competing network of area A with O(log/sup 3/2/ A) overhead independently of wire delay and implies that the fat- stack of a given size is nearly the best routing network of that size.
A Mesh-of-Trees Interconnection Network for Single-Chip Parallel Processing
  • Aydin O. Balkan, G. Qu, U. Vishkin
  • Computer Science
    IEEE 17th International Conference on Application-specific Systems, Architectures and Processors (ASAP'06)
  • 2006
TLDR
It is shown that on-chip interconnection networks can provide higher bandwidth between processors and shared first-level cache than previously considered possible, facilitating greater scalability of memory architectures that require that.
Reducing complexity in tree-like computer interconnection networks
NAP(No ALU Processor): The Great Communicator
Randomized routing on fat-tress
TLDR
In a VLSI-like model where hardware cost is equated with physical volume, the routing algorithm is used to demonstrate that fat-trees are universal routing networks in the sense that any routing network can be efficiently simulated by a fat-tree of comparable hardware cost.
Fat-tree for local area multiprocessors
  • Qiang Li, D. Gustavson
  • Computer Science
    Proceedings of 9th International Parallel Processing Symposium
  • 1995
TLDR
This paper examines the use of a fat-tree topology for this (possibly distributed) switch and results are presented to show the latency, throughput, buffer requirements, and the effect of cable length.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 38 REFERENCES
Randomized routing on fat-tress
TLDR
In a VLSI-like model where hardware cost is equated with physical volume, the routing algorithm is used to demonstrate that fat-trees are universal routing networks in the sense that any routing network can be efficiently simulated by a fat-tree of comparable hardware cost.
How to assemble tree machines
TLDR
The authors give a linear-area chip of m processors and only four off-chip connections which can be used as the sole building block to construct an arbitrarily large complete binary tree.
The cube-connected cycles: a versatile network for parallel computation
TLDR
This work describes in detail how to program the cube-connected cycles for efficiently solving a large class of problems that include Fast Fourier transform, sorting, permutations, and derived algorithms.
A Scheme for Fast Parallel Communication
TLDR
There is a distributed randomized algorithm that can route every packet to its destination without two packets passing down the same wire at any one time, and finishes within time $O(\log N)$ with overwhelming probability for all such routing requests.
Area-Efficient VLSI Computation
TLDR
The two parts of this thesis address the contribution of communication to the performance and area of an integrated circuit, and provide mathematical views of an engineering discipline: techniques of theoretical computer science--e.g., divide and conquer, automata theory, asymptotic analysis--applied to integrated circuit computation.
Global wire routing in two-dimensional arrays
TLDR
A central result of this paper is a “rounding algorithm” for obtaining integral approximations to solutions of linear equations for matrix A and real vector x.
Area-Efficient Graph Layouts (for VLSI).
TLDR
An algorithm is given that produces VLSI layouts for classes of graphs that have good separator theorems and shows in particular that any planar graph of n vertices has an O(n lg-square(n) area layout and that any tree of n Vertices can be laid out in linear area.
Universal schemes for parallel communication
TLDR
This paper shows that there exists an N-processor computer that can simulate arbitrary N- processor parallel computations with only a factor of O(log N) loss of runtime efficiency, and isolates a combinatorial problem that lies at the heart of this question.
Polymorphic Arrays: A Novel VLSI Layout for Systolic Computers
TLDR
This paper proposes a novel architecture for massively parallel systolic computers, which is based on results from lattice theory, and guarantees exceptional load uniformity for rectangular process arrays of arbitrary sizes.
The Complexity Theory of Switching Networks.
Abstract : The author considers switching networks of the type used for line switching in communication networks or for reconfiguration of modular computer systems, and examines the complexity
...
1
2
3
4
...