Learn More
A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface , efficient communication operations, and the advantage of portability. A library of this nature, the Collective(More)
This paper presents a deterministic sorting algorithm, called Sharesort, that sorts n records on an n processor hypercube, shuffle-exchange or cube-connected cycles in O(log n(loglog n) 2) time in the worst case. The algorithm requires only a constant amount of storage at each processor. The fastest previous deterministic algorithm for this problem was(More)
In this paper we present routing algorithms that are tmi-versal in the sense that they route messages along arbitrary (simple) paths in arbitrary networks. The algorithms are analyzed in terms of the number of messages being routed, the maximum number of messages that must cross any edge in the network (edge congestion), the maximum number of edges that a(More)
This paper presents Simultaneous Speculative Threading (SST), which is a technique for creating high-performance area- and power-efficient cores for chip multiprocessors. SST hardware dynamically extracts two threads of execution from a single sequential program (one consisting of a load miss and its dependents, and the other consisting of the instructions(More)
This paper studies the behavior of scientific applications running on distributed memory parallel computers. Our goal is to quantify the floating point, memory, I/O and communication requirements of highly parallel scientific applications that perform explicit communication. In addition to quantifying these requirements for fixed problem sizes and numbers(More)