• Publications
  • Influence
The Landscape of Parallel Computing Research: A View from Berkeley
TLDR
The parallel landscape is frame with seven questions, and the following are recommended to explore the design space rapidly: • The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems • The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS each development dollar. Expand
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
TLDR
This work examines sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs, and presents several optimization strategies especially effective for the multicore environment. Expand
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
TLDR
This work examines sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs, and presents several optimization strategies especially effective for the multicore environment. Expand
Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud
TLDR
This work represents the most comprehensive evaluation to date comparing conventional HPC platforms to Amazon EC2, using real applications representative of the workload at a typical supercomputing center, and results indicate that EC2 is six times slower than a typical mid-range Linux cluster, and twenty times faster than a modern HPC system. Expand
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
TLDR
This work explores multicore stencil (nearest-neighbor) computations --- a class of algorithms at the heart of many structured grid codes, including PDF solvers, and develops a number of effective optimization strategies, and builds an auto-tuning environment that searches over these strategies to minimize runtime, while maximizing performance portability. Expand
Using IOR to analyze the I/O Performance for HPC Platforms
TLDR
This work first analyzed the I/O practices and requirements of current HPC applications and used them as criteria to select a subset of microbenchmarks that reflect the workload requirements, which led to selection of IOR, an I/o benchmark developed by LLNL for the ASCI Purple procurement, as the tool to study theI/O performance on two HPC platforms. Expand
Memory Errors in Modern Systems: The Good, The Bad, and The Ugly
TLDR
This study uses data from two leadership-class high-performance computer systems to analyze the reliability impact of hardware resilience schemes that are deployed in current systems and finds that counting errors instead of faults, a common practice among researchers and data center operators, can lead to incorrect conclusions about system reliability. Expand
Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark
TLDR
It is concluded that IOR is an effective replacement for full-application I/O benchmarks and can bridge the gap of understanding that typically exists between stand-alone benchmarks and the full applications they intend to model. Expand
The International Exascale Software Project roadmap
TLDR
The work of the community to prepare for the challenges of exascale computing is described, ultimately combing their efforts in a coordinated International Exascale Software Project. Expand
The potential of the cell processor for scientific computing
TLDR
This work introduces a performance model for Cell and applies it to several key scientific computing kernels: dense matrix multiply, sparse matrix vector multiply, stencil computations, and 1D/2D FFTs, and proposes modest microarchitectural modifications that could significantly increase the efficiency of double-precision calculations. Expand
...
1
2
3
4
5
...