#### Filter Results:

#### Publication Year

1987

2009

#### Publication Type

#### Co-author

#### Key Phrase

#### Publication Venue

Learn More

- Shailender Chaudhry, Robert Cypher, Magnus Ekman, Martin Karlsson, Anders Landin, Sherman Yip +2 others
- IEEE Micro
- 2009

- Vasanth Bala, Shlomo Kipnis, Marc Snir, Jehoshua Bruck, Robert Cypher, Pablo Elustondo +2 others
- IPPS
- 1994

A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface , efficient communication operations, and the advantage of portability. A library of this nature, the Collective… (More)

Many parallel computers consist of processors connected in the form of a d-dimensional mesh or hypercube. Two-and three-dimensional meshes have been shown to be efficient in manipulating images and dense matrices, whereas hypercubes have been shown to be well suited to divide-and-conquer algorithms requiring global communication. However, even a single… (More)

This paper examines the amount of communication that is required for performing mutual exclusion. It is assumed that n processors communicate via accesses to a shared memory that is physically distributed among the processors. We consider the possibility of creating a scalable mutual exclusion protocol that requires only a constant amount of communication… (More)

- Shailender Chaudhry, Robert Cypher, Magnus Ekman, Martin Karlsson, Anders Landin, Sherman Yip +2 others
- ISCA
- 2009

This paper presents Simultaneous Speculative Threading (SST), which is a technique for creating high-performance area- and power-efficient cores for chip multiprocessors. SST hardware dynamically extracts two threads of execution from a single sequential program (one consisting of a load miss and its dependents, and the other consisting of the instructions… (More)

This paper studies the behavior of scientific applications running on distributed memory parallel computers. Our goal is to quantify the floating point, memory, I/O and communication requirements of highly parallel scientific applications that perform explicit communication. In addition to quantifying these requirements for fixed problem sizes and numbers… (More)

This paper presents a deterministic sorting algorithm, called Sharesort, that sorts n records on an n processor hypercube, shuffle-exchange or cube-connected cycles in O(log n(loglog n) 2) time in the worst case. The algorithm requires only a constant amount of storage at each processor. The fastest previous deterministic algorithm for this problem was… (More)

In this paper we present routing algorithms that are tmi-versal in the sense that they route messages along arbitrary (simple) paths in arbitrary networks. The algorithms are analyzed in terms of the number of messages being routed, the maximum number of messages that must cross any edge in the network (edge congestion), the maximum number of edges that a… (More)

We examine the issue of running algorithms on a hypercube which has both node and edge faults, and we assume a worst case distribution of the faults. We prove that for any constant c, an n-dimensional hypercube (n-cube) with n c faulty components contains a fault-free subgraph that can implement a large class of hypercube algorithms with only a constant… (More)