Robert Schreiber

Learn More
A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of five parallel kernels and three simulated application benchmarks. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal(More)
We expect that many-core microprocessors will push performance per chip from the 10 gigaflop to the 10 teraflop range in the coming decade. To support this increased performance, memory and inter-core bandwidths will also have to scale by orders of magnitude. Pin limitations, the energy cost of electrical signaling, and the non-scalability of chip-length(More)
Many recommendation systems suggest items to users by utilizing the techniques of collaborative filtering (CF) based on historical records of items that the users have viewed, purchased, or rated. Two major problems that most CF approaches have to resolve are scalability and sparseness of the user profiles. In this paper, we describe(More)
The matrix computation language and environment MATLAB is extended to include sparse matrix storage and operations. The only change to the outward appearance of the MATLAB language is a pair of commands to create full or sparse matrices. Nearly all the operations of MATLAB now apply equally to full or sparse matrices, without any explicit action by the(More)
A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of five ‘parallel kernel” benchmarks and three ‘simulated application” benchmarks. Together they mimic the computation and data movement charactem”stics of large scale computational fluid dynamics applications. The principal(More)
Generating local addresses and communication sets is an important issue in distributed-memory implementations of data-parallel languages such as High Performance Fortran. We show that for an array <italic>A</italic> affinely aligned to a <italic>template</italic> that is distributed across <italic>p</italic> processors with a <italic>cyclic(k)</italic>(More)
We describe several new bottom-up approaches to problems in role engineering for Role-Based Access Control (RBAC). The salient problems are all NP-complete, even to approximate, yet we find that in instances that arise in practice these problems can be solved in minutes. We first consider role minimization, the process of finding a smallest collection of(More)
By providing high bandwidth chip-wide communication at low latency and low power, on-chip optics can improve many-core performance dramatically. Optical channels that connect many nodes and allow for single cycle cache-line transmissions will require fast, high bandwidth arbitration. We exploit CMOS nanophotonic devices to create arbiters that meet the(More)
Compared to the customary column-oriented approaches, block-oriented, distributed-memory sparse Cholesky factorization benefits from an asymptotic reduction in interprocessor communication volume and an asymptotic increase in the amount of concurrency that is exposed in the problem. Unfortunately, block-oriented approaches (specifically, the block fan-out(More)
Blocked algorithms have much better properties of data locality and therefore can be much more efficient than ordinary algorithms when a memory hierarchy is involved. On the other hand, they are very difficult to write and to tune for particular machines. Here we consider the reorganization of nested loops through the use of known program transformations in(More)