Brian Koblenz

Learn More
The Tera architecture was designed with several ma jor goals in mind. First, it needed to be suitable for very high speed implementations, i. e., admit a short clock period and be scalable to many processors. This goal will be achieved; a maximum configuration of the first implementation of the architecture will have 256 processors, 512 memory units, 256(More)
This paper describes an integrated architecture, compiler, runtime, and operating system solution to exploiting heterogeneous parallelism. The architecture is a pipelined multi-threaded multiprocessor, enabling the execution of very fine (multiple operations within an instruction) to very coarse (multiple jobs) parallel activities. The compiler and runtime(More)
The Tera MTA is a revolutionary commercial computer based on a multithreaded processor architecture. In contrast to many other parallel architectures, the Tera MTA can effectively use high amounts of parallelism on a single processor. By running multiple threads on a single processor, it can tolerate memory latency and to keep the processor saturated. If(More)
On recent high-performance multiprocessors, there is a potential con ict between the goals of achieving the full performance potential of the hardware and providing a parallel programming environment that makes e ective use of programmer e ort. On one hand, an explicit coarse-grain programming style may appear to be necessary, both to achieve good cache(More)
The directory-based cache coherence protocol for the DASH multiprocessor. The queue-read queue-write asyn-chronous PRAM model. Parallel algorithms for shared-memory machines. 35 Cyp88] R. Cypher. Valiant's maximum algorithm with sequential memory accesses. Techni-34 compaction, load balancing, generating a random permutation and parallel hashing. These(More)
  • 1