Taraflops into laptops

  • Steven J. Wallach
  • Published 1994 in
    IEEE Parallel & Distributed Technology: Systems…

Abstract

Level Clocks Slowdown Register 1 Level 1 cache 2-3 2-3 Level 2 cache 6-10 2-3 Store 2 0+ 2-3 So each level down the hierarchy is a factor of 2 or 3 slower than the previous one. If we view store accessed over the switch as the next level of the memory hierarchy, this implies that we want to achieve an access through the switch in around 40-60 CPU cycles that is, in 400-600 nanoseconds for a 1 00-MHz clocked C P U (probably a low estimate). ATiM is currently viewed as the lowest latency nonproprietary switch structure, but such switches have a single switch latency of around 1.25 sec; this implies a full switch network latency of around 4 Fsec for a 256-node machine, a factor of 10 too large. So far I have ignored the latency in getting from a user request out to the switch network. If the network is accessed as a communications device (as will happen with a naive ATM interface), this will involve system calls and the kernel of the operating system. Many thousands of instructions will be executed, translating

DOI: 10.1109/M-PDT.1994.329787

Cite this paper

@article{Wallach1994TaraflopsIL, title={Taraflops into laptops}, author={Steven J. Wallach}, journal={IEEE Parallel & Distributed Technology: Systems & Applications}, year={1994}, volume={2}, pages={8-} }