Learn More
A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface , efficient communication operations, and the advantage of portability. A library of this nature, the Collective(More)
Many parallel computers consist of processors connected in the form of a d-dimensional mesh or hypercube. Two-and three-dimensional meshes have been shown to be efficient in manipulating images and dense matrices, whereas hypercubes have been shown to be well suited to divide-and-conquer algorithms requiring global communication. However, even a single(More)
This paper presents Simultaneous Speculative Threading (SST), which is a technique for creating high-performance area- and power-efficient cores for chip multiprocessors. SST hardware dynamically extracts two threads of execution from a single sequential program (one consisting of a load miss and its dependents, and the other consisting of the instructions(More)
This paper studies the behavior of scientific applications running on distributed memory parallel computers. Our goal is to quantify the floating point, memory, I/O and communication requirements of highly parallel scientific applications that perform explicit communication. In addition to quantifying these requirements for fixed problem sizes and numbers(More)
This paper presents a deterministic sorting algorithm, called Sharesort, that sorts n records on an n processor hypercube, shuffle-exchange or cube-connected cycles in O(log n(loglog n) 2) time in the worst case. The algorithm requires only a constant amount of storage at each processor. The fastest previous deterministic algorithm for this problem was(More)
In this paper we present routing algorithms that are tmi-versal in the sense that they route messages along arbitrary (simple) paths in arbitrary networks. The algorithms are analyzed in terms of the number of messages being routed, the maximum number of messages that must cross any edge in the network (edge congestion), the maximum number of edges that a(More)
We examine the issue of running algorithms on a hypercube which has both node and edge faults, and we assume a worst case distribution of the faults. We prove that for any constant c, an n-dimensional hypercube (n-cube) with n c faulty components contains a fault-free subgraph that can implement a large class of hypercube algorithms with only a constant(More)