problems To understand the class of polynomial-time solvable problems, we must first have a formal notion of what a "problem" is. We define an abstract problem Q to be a binary relation on a set I ofâ€¦ (More)

The fifth release of the multithreaded language Cilk uses a provably good "work-stealing" scheduling algorithm similar to the first system, but the language has been completely redesigned and theâ€¦ (More)

This paper describes a circuit transformation calledretiming in which registers are added at some points in a circuit and removed from others in such a way that the functional behavior of the circuitâ€¦ (More)

This article advances the following thesis: transactional memory should be virtualized to support transactions of arbitrary footprint and duration. Such support should be provided through hardwareâ€¦ (More)

This article presents asymptotically optimal algorithms for rectangular matrix transpose, fast Fourier transform (FFT), and sorting on computers with multiple levels of caching. Unlike previousâ€¦ (More)

The author presents a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer. A fat-tree routingâ€¦ (More)

The Connection Machine Model CM-5 Supercomputer is a massively parallel computer system designed to ooer performance in the range of 1 teraaops (10 12 oating-point operations per second). The CM-5â€¦ (More)

A stencil computation repeatedly updates each point of a d-dimensional grid as a function of itself and its near neighbors. Parallel cache-efficient stencil algorithms based on "trapezoidalâ€¦ (More)

The availability of multicore processors across a wide range of computing platforms has created a strong demand for software frameworks that can harness these resources. This paper overviews theâ€¦ (More)