Charles E. Leiserson

Learn More
problems<lb>To understand the class of polynomial-time solvable problems, we must first have a formal<lb>notion of what a "problem" is. We define an abstract problem Q to be a binary relation on a<lb>set I of problem instances and a set S of problem solutions. For example, an instance for<lb>SHORTEST-PATH is a triple consisting of a graph and two vertices.(More)
Cilk (pronounced &#8220;silk&#8221;) is a C-based runtime system for multi-threaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the &#8220;work&#8221; and &#8220;critical path&#8221; of a Cilk computation can be used(More)
The fifth release of the multithreaded language Cilk uses a provably good "work-stealing" scheduling algorithm similar to the first system, but the language has been completely redesigned and the runtime system completely reengineered. The efficiency of the new implementation was aided by a clear strategy that arose from a theoretical analysis of the(More)
This paper describes a circuit transformation calledretiming in which registers are added at some points in a circuit and removed from others in such a way that the functional behavior of the circuit as a whole is preserved. We show that retiming can be used to transform a given synchronous circuit into a more efficient circuit under a variety of different(More)
This article presents asymptotically optimal algorithms for rectangular matrix transpose, fast Fourier transform (FFT), and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are <i>cache oblivious</i>: no variables dependent on hardware parameters, such as cache size and cache-line length, need to be(More)
Hardware transactional memory should support unbounded transactions: transactions of arbitrary size and duration. We describe a hardware implementation of unbounded transactional memory, called UTM, which exploits the common case for performance without sacrificing correctness on transactions whose footprint can be nearly as large as virtual memory. We(More)
The Connection Machine Model CM-5 Supercomputer is a massively parallel computer system designed to offer performance in the range of 1 teraflops (1012 floating-point operations per second). The CM-5 obtains its high performance while offering ease of programming, flexibility, and reliability. The machine contains three communication networks: a data(More)
The complexity of integrated-circuit chips produced today makes it feasible to build inexpensive, special-purpose subsystems that rapidly solve sophisticated problems on behalf of a general-purpose host computer. This paper contributes to the design methodology of efficient VLSI algorithms. We present a transformation that converts synchronous systems into(More)