Learn More
SUMMARY A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the "Parallel Kernels," and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications.(More)
We expect that many-core microprocessors will push performance per chip from the 10 gigaflop to the 10 teraflop range in the coming decade. To support this increased performance, memory and inter-core bandwidths will also have to scale by orders of magnitude. Pin limitations, the energy cost of electrical signaling, and the non-scalability of chip-length(More)
Many recommendation systems suggest items to users by utilizing the techniques of collaborative filtering (CF) based on historical records of items that the users have viewed, purchased, or rated. Two major problems that most CF approaches have to resolve are scal-ability and sparseness of the user profiles. In this paper, we describe(More)
We describe several new bottom-up approaches to problems in role engineering for Role-Based Access Control (RBAC). The salient problems are all NP-complete, even to approximate, yet we find that in instances that arise in practice these problems can be solved in minutes. We first consider role minimization, the process of finding a smallest collection of(More)
By providing high bandwidth chip-wide communication at low latency and low power, on-chip optics can improve many-core performance dramatically. Optical channels that connect many nodes and allow for single cycle cache-line transmissions will require fast, high bandwidth arbitration. We exploit CMOS nanophotonic devices to create arbiters that meet the(More)
Generating local addresses and communication sets is an important issue in distributed-memory implementations of data-parallel languages such as High Performance Fortran. We show that for an array <italic>A</italic> affinely aligned to a <italic>template</italic> that is distributed across <italic>p</italic> processors with a <italic>cyclic(k)</italic>(More)