Roofline: an insightful visual performance model for multicore architectures

@article{Williams2009RooflineAI,
  title={Roofline: an insightful visual performance model for multicore architectures},
  author={Samuel Williams and Andrew Waterman and D. Patterson},
  journal={Commun. ACM},
  year={2009},
  volume={52},
  pages={65-76}
}
The Roofline model offers insight on how to improve the performance of software and hardware. 

Figures, Tables, and Topics from this paper

Cache-aware Roofline model: Upgrading the loft
TLDR
This paper analyzes the original Roofline model and proposes a novel approach to provide a more insightful performance modeling of modern architectures by introducing cache-awareness, thus significantly improving the guidelines for application optimization. Expand
Auto-Tuning the 27-point Stencil for Multicore
TLDR
This study illustrates how auto-tuning can be used to produce very efficient implementations across a diverse set of current multicore architectures. Expand
The boat hull model: adapting the roofline model to enable performance prediction for parallel computing
TLDR
This work modifications the roofline model to include class information to enable architectural choice through performance prediction prior to the development of architecture specific code, and shows for 6 example algorithms that performance is predicted accurately without requiring code to be available. Expand
A Roofline Visualization Framework
TLDR
An initial implementation of the third component, a system for visualizing roofline charts and managing roofline performance analysis data is introduced and the implementation and rationale for the integration of the roofline visualization system into the Eclipse IDE is discussed. Expand
Introduction to High Performance Scientific Computing
This is a textbook that teaches the bridging topics between numerical analysis, parallel computing, code performance, large scale applications.
Region-based memory management for expressive GPU programming
i List of Figures xiii Chapter
Multicore Performance Engineering of Sparse Triangular Solves Using a Modified Roofline Model
  • M. Wittmann, G. Hager, +5 authors G. Wellein
  • Computer Science
  • 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)
  • 2018
TLDR
This paper tries to establish realistic bandwidth ceilings for the sparse triangular solve step of PARDISO, a leading sparse direct solver package, which is also part of the Intel MKL library. Expand
A Graphical Tool for Performance Analysis of Multicore Systems Based on the Roofline Model
TLDR
An easy to use tool to provide an insightful model which allows to determine, at a glance, performance issues like load balance, locality and those related to thread and memory allocation. Expand
Performance engineering: a must for petascale and beyond
TLDR
The need for a more principled approach to the management of the performance of applications for petascale platforms is discussed and some initial successes, related to the Blue Waters project, are outlined. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 58 REFERENCES
Performance of Synchronized Iterative Processes in Multiprocessor Systems
A general methodology for studying the degree of matching between an architecture and an algorithm is introduced and applied to the case of synchronized iterative algorithms in MIMD machines.
A Hierarchical Approach to Modeling and Improving the Performance of Scientific Applications on the KSR1
TLDR
This work delineates a comprehensive approach to modeling and improving application performance on the KSR1, and proposes a workload characterization, and derive upper bounds on the performance of specific machine-workload pairs. Expand
Mapping computational concepts to GPUs
TLDR
This chapter presents intuitive mappings of standard computational concepts onto the special-purpose features of GPUs and introduces a simple GPU programming framework and demonstrates the use of the framework in a short sample program. Expand
Self-Adapting Linear Algebra Algorithms and Software
TLDR
The generation of dense and sparse Basic Linear Algebra Subprograms (BLAS) kernels and the selection of linear solver algorithms are described. Expand
Computer Architecture: A Quantitative Approach
This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most importantExpand
Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities
An instrument for facilitating the calculation of equivalent values includes a plate bearing symbols representing units and dimensions, the plate having a window in which a movable pointer isExpand
The SPLASH-2 programs: characterization and methodological considerations
TLDR
This paper quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well, including the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality. Expand
Amdahl's Law in the Multicore Era
  • M. Hill
  • Computer Science
  • Computer
  • 2008
Augmenting Amdahl's law with a corollary for multicore hardware makes it relevant to future generations of chips with multiple processor cores. Obtaining optimal multicore performance will requireExpand
Amdahl's Law in the Multicore Era
Augmenting Amdahl's law with a corollary for multicore hardware makes it relevant to future generations of chips with multiple processor cores. Obtaining optimal multicore performance will requireExpand
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
  • K. Datta, M. Murphy, +6 authors K. Yelick
  • Computer Science
  • 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
  • 2008
TLDR
This work explores multicore stencil (nearest-neighbor) computations - a class of algorithms at the heart of many structured grid codes, including PDE solvers - and develops a number of effective optimization strategies, and builds an auto-tuning environment that searches over the optimizations and their parameters to minimize runtime, while maximizing performance portability. Expand
...
1
2
3
4
5
...