#### Filter Results:

- Full text PDF available (13)

#### Publication Year

1990

2008

- This year (0)
- Last 5 years (0)
- Last 10 years (1)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Key Phrases

Learn More

We have implemented three parallel sorting algorithms on the Connection Machine Supercomputer model CM-2: Batcher's bitonic sort, a parallel radix sort, and a sample sort similar to Reif and Valiant's ashsort. We have also evaluated the implementation of many other sorting algorithms proposed in the literature. Our computational experiments show that the… (More)

- Guy E. Blelloch, Jonathan C. Hardwick, Jay Sipelstein, Marco Zagha, Siddhartha Chatterjee
- J. Parallel Distrib. Comput.
- 1993

This paper gives an overview of the implementation of NESL, a portable nested data-parallel language. This language and its implementation are the first to fully support nested data structures as well as nested data-parallel function calls. These features allow the concise description of parallel algorithms on irregular data, such as sparse matrices and… (More)

In this paper we present a new technique for sparse matrix multiplication on vector multiprocessors based on the efficient implementation of a segmented sum operation. We describe how the segmented sum can be implemented on vector multiprocessors such that it both fully vectorizes within each processor and parallelizes across processors. Because of our… (More)

This paper describes an optimized implementation of a set of <italic>scan</italic> (also called all-prefix-sums) primitives on a single processor of a CRAY Y-MP, and demonstrates that their use leads to greatly improved performance for several applications that cannot be vectorized with existing compiler technology. The algorithm used to implement the scans… (More)

- Marco Zagha, Guy E. Blelloch
- SC
- 1991

We have designed a radix sort algorithm for vector multiprocessors and have implemented the algorithm on the CRAY Y-MP. On one processor of the Y-MP, our sort is over 5 times faster on large sorting problems than the optimized library sort provided by CRAY Research. On eight processors we achieve an additional speedup of almost 5, yielding a routine over 25… (More)

- Michael J. Witbrock, Marco Zagha
- Parallel Computing
- 1990

Current connectionist simulations require huge computational resources. We describe a neural network simulator for the IBM GF11, an experimental SIMD machine with 566 processors and a peak arithmetic performance of 11 Gigaflops. We present our parallel implementation of the backpropagation learning algorithm, techniques for increasing efficiency,… (More)

- Guy E. Blelloch, Phillip B. Gibbons, Yossi Matias, Marco Zagha
- IEEE Trans. Parallel Distrib. Syst.
- 1995

For years the computation rate of processors has been much faster than the access rate of memory banks and this divergence in speeds has been constantly increasing in recent years As a result several shared memory multiprocessors con sist of more memory banks than processors The object of this paper is to provide a simple model with only a few parameters… (More)

- Keith D. Gremban, Gary L. Miller, Marco Zagha
- IPPS
- 1995

Solution of partial differential equations by either the finite element or the finite difference methods often requires the solution of large, sparse linear systems. When the coefficient matrices associated with these linear systems are symmetric and positive definite, the systems are often solved iteratively using the preconditioned conjugate gradient… (More)

- Guy E. Blelloch, Siddhartha Chatterjee, Marco Zagha
- IPPS
- 1992

We present a variation of the partitionmethod for solving mth-order linear recurrences that is well-suited to vector multiprocessors. The algorithm fully utilizes both vector and multiprocessor capabilities, and reduces the number of memory accesses as compared to the more commonly used version of the partition method. Our variation uses a general loop… (More)

In this paper we present ,, new technique for sparse matrix multiplication on vector multiprocessors based on the efficient implementation of a segmented sum operation. We describe how the segmented sum can be implemented On vector multiprocessors such that it both fully vectorizes within each processor and parallelizes across processors. Because of our… (More)