Share This Author
The Landscape of Parallel Computing Research: A View from Berkeley
The parallel landscape is frame with seven questions, and the following are recommended to explore the design space rapidly: • The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems • The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS each development dollar.
OSKI: A Library of Automatically Tuned Sparse Matrix Kernels
An overview of OSKI is provided, which is based on research on automatically tuned sparse kernels for modern cache-based superscalar machines, and the primary aim of this interface is to hide the complex decision-making process needed to tune the performance of a kernel implementation for a particular user's sparse matrix and machine.
A case for intelligent RAM
The state of microprocessors and DRAMs today is reviewed, some of the opportunities and challenges for IRAMs are explored, and performance and energy efficiency of three IRAM designs are estimated.
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
- Samuel Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, J. Demmel
- Computer ScienceProceedings of the ACM/IEEE Conference on…
- 31 July 2007
This work examines sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs, and presents several optimization strategies especially effective for the multicore environment.
A view of the parallel computing landscape
Writing programs that scale with increasing numbers of cores should be as easy as writing programs for sequential computers.
Titanium: A High-performance Java Dialect
This work discusses the main additions to Java are immutable classes, multidimensional arrays, an explicitly parallel SPMD model of computation with a global address space, and zone-based memory management, and reports progress on the development of Titanium.
Parallel programming in Split-C
- D. Culler, A. Arpaci-Dusseau, K. Yelick
- Computer ScienceSupercomputing '93. Proceedings
- 1 December 1993
The authors introduce the Split-C language, a parallel extension of C intended for high performance programming on distributed memory multiprocessors, and demonstrate the use of the language in…
UPC: Distributed Shared-Memory Programming
This tutorial jumps right in to the power of UPC without dragging you through basic programming, with examples of both the UPC Programming Model and UPC Library in action.
Introduction to UPC and Language Specification
UPC is a parallel extension of the C programming language intended for multiprocessors with a common global address space to provide efficient access to the underlying machine and to establish a common syntax and semantics for explicitly parallel programming in C.
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
- K. Datta, M. Murphy, K. Yelick
- Computer ScienceSC - International Conference for High…
- 15 November 2008
This work explores multicore stencil (nearest-neighbor) computations - a class of algorithms at the heart of many structured grid codes, including PDE solvers - and develops a number of effective optimization strategies, and builds an auto-tuning environment that searches over the optimizations and their parameters to minimize runtime, while maximizing performance portability.