• Corpus ID: 62143065

The Landscape of Parallel Computing Research: A View from Berkeley

@inproceedings{Asanovi2006TheLO,
  title={The Landscape of Parallel Computing Research: A View from Berkeley},
  author={Krste Asanovi{\'c} and Rastislav Bod{\'i}k and Bryan Catanzaro and Joseph Gebis and Parry Husbands and Kurt Keutzer and David A. Patterson and William Plishker and John Shalf and Samuel Williams and Katherine A. Yelick},
  year={2006}
}
Author(s): Asanovic, K; Bodik, R; Catanzaro, B; Gebis, J; Husbands, P; Keutzer, K; Patterson, D; Plishker, W; Shalf, J; Williams, SW | Abstract: The recent switch to parallel microprocessors is a milestone in the history of computing. Industry has laid out a roadmap for multicore designs that preserves the programming paradigm of the past via binary compatibility and cache coherence. Conventional wisdom is now to double the number of cores on a chip with each silicon generation. A… 

Figures from this paper

Auto-tuning performance on multicore computers

TLDR
It is shown that auto-tuning consistently delivers speedups in excess of 3× across all multicore computers except the memory-bound Intel Clovertown, where the benefit was as little as 1.5×.

Communication for programmability and performance on multi-core processors

TLDR
This dissertation considers the programmability challenges of the multi-core era, and proposes and describes an asynchronous remote store instruction, which is issued by one core and completed asynchronously by another into its own local cache, and evaluates several patterns of parallel communication.

Optimization of Scientific Computation for Multicore Systems

TLDR
This thesis examines the problems of sorting, matrix multiplication, and ordinary differential equation initial value problems on two target architectures, the Cell Broadband Engine, and the Nvidia CUDA enabled graphics processor to exploit various levels of parallelism.

Pitfalls and Issues of Manycore Programming

HPPC 2007: Workshop on Highly Parallel Processing on a Chip

TLDR
It is argued that the PRAM-On-Chip approach is a promising candidate for providing the processor-of-the-future and focusing on a small number of promising approaches would be most beneficial both for the field as a whole and for an individual researcher who is seeking improved impact.

The Parallel Revolution Has Started: Are You Part of the Solution or Part of the Problem? - An Overview of Research at the Berkeley Parallel Computing Laboratory

TLDR
This talk gives an update on where the Par Lab is two years on, including a surprisingly compact set of recurring computational patterns, which are termed "motifs", and believes that any successful software architecture, parallel or serial, can be described as a hierarchy of patterns.

OpenCL and the 13 dwarfs: a work in progress

TLDR
The goal of this combination "Work-in-Progress and Vision" paper is to delineate application requirements in a manner that is not overly specific to individual applications or the optimizations used for certain hardware platforms, so that the authors can draw broader conclusions about hardware requirements.

The Future of Accelerator Programming: Abstraction, Performance or Can We Have Both?

TLDR
This paper investigates current approaches to portable accelerator programming, seeking to answer whether they make it possible to combine high efficiency with sufficient algorithm abstraction, and presents three approaches of writing portable code: GPU-centric, CPU-centric and combined.

Operating System Support for Parallel Processes

TLDR
This work describes the MCP abstraction and the salient details of Akaros, and discusses how the kernel and user-level libraries work together to give an application control over its physical resources and to adapt to the revocation of cores at any time - even when the code is holding locks.

Can One-Chip Parallel Computing Be Liberated From Ad Hoc Solutions ? A Computation Model Based Approach and Its Implementation

TLDR
This work proposes a fresh restart of parallel computation based on the synergetic interaction between a parallel computing model (Kleene’s model of partial recursive functions), an abstract machine model, an adequate architecture and a friendly programming environment, and a simple and efficient generic structure.
...

References

SHOWING 1-10 OF 141 REFERENCES

RAMP: research accelerator for multiple processors - a community vision for a shared experimental parallel HW/SW platform

TLDR
The acronynm RAMP, for Research Accelerator for Multiple Processors, has the potential to transform the parallel computing community in computer science from a simulation-driven to a prototype-driven discipline, leading to rapid iteration across interfaces of the many fields of multiple processors, and thereby moving much more quickly to a parallel foundation for large-scale computer systems research in the 21st century.

X10: an object-oriented approach to non-uniform cluster computing

TLDR
A modern object-oriented programming language, X10, is designed for high performance, high productivity programming of NUCC systems and an overview of the X10 programming model and language, experience with the reference implementation, and results from some initial productivity comparisons between the X 10 and Java™ languages are presented.

The Parallel Computing Laboratory at U.C. Berkeley: A Research Agenda Based on the Berkeley View

TLDR
This report is based on a proposal for creating a Universal Parallel Computing ResearchCenter (UPCRC) that a technical committee from Intel and Microsoft unanimously selected as the top proposal in a competition with the top 25 computer science departments.

Microprocessors for the new millennium: Challenges, opportunities, and new frontiers

  • P. Gelsinger
  • Computer Science
    2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC (Cat. No.01CH37177)
  • 2001
TLDR
Future microprocessors will evolve as integration of DSP capabilities becomes imperative to enable such applications as media-rich communications, computer vision, and speech recognition, which will lead to a change in the computing paradigm from today's data-based, machine-based computing to tomorrow's knowledge- based, human- based computing.

The Tera computer system

TLDR
The Tera architecture was designed with several goals in mind; it needed to be suitable for very high speed implementations, i.

An Experiment in Measuring the Productivity of Three Parallel Programming Languages

TLDR
Interesting insights have been obtained into the problem-solving process of novice parallel programmers, including those exposing productivity pitfalls in each language, and significant differences among individuals and groups.

A stream compiler for communication-exposed architectures

TLDR
This paper describes a fully functional compiler that parallelizes StreamIt applications for Raw, including several load-balancing transformations, and demonstrates that the StreamIt compiler can automatically map a high-level stream abstraction to Raw without losing performance.

High-level programming language abstractions for advanced and dynamic parallel computations

TLDR
By including a set of p-dependent abstractions into a language with a largely p-independent framework, the task of parallel programming is greatly simplified, and ZPL code is shown to be easier to write than MPI code even while the performance is competitive with MPI.

The cascade high productivity language

  • D. CallahanB. ChamberlainH. Zima
  • Computer Science
    Ninth International Workshop on High-Level Parallel Programming Models and Supportive Environments, 2004. Proceedings.
  • 2004
TLDR
The design of Chapel, the cascade high productivity language, is described, which is being developed in the DARPA-funded HPCS project Cascade led by Cray Inc, and pushes the state-of-the-art in languages for HEC system programming by focusing on productivity.

Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

TLDR
A BLAS GEMM compatible multi-level cache-blocked matrix multiply generator which produces code that achieves around 90% of peak on the Sparcstation-20/61, IBM RS/6000-590, HP 712/8Oi, SGI Power Challenge RBk, and SGI Octane RlOk, and over 80% ofpeak on the SGI Indigo R4k.
...