The boat hull model: adapting the roofline model to enable performance prediction for parallel computing

@inproceedings{Nugteren2012TheBH,
  title={The boat hull model: adapting the roofline model to enable performance prediction for parallel computing},
  author={Cedric Nugteren and Henk Corporaal},
  booktitle={PPoPP '12},
  year={2012}
}
Multi-core and many-core were already major trends for the past six years, and are expected to continue for the next decades. With these trends of parallel computing, it becomes increasingly difficult to decide on which architecture to run a given application. In this work, we use an algorithm classification to predict performance prior to algorithm implementation. For this purpose, we modify the roofline model to include class information. In this way, we enable architectural choice through… 

Figures and Tables from this paper

Performance Optimization on GPGPU & Multicore CPU Using Roofline Model
The roofline model introduced in this paper to evaluate the best optimized platform for training the neural network that used to recognize handwritten digits under multicore CPU and general-purpose
Performance Modeling for FPGAs: Extending the Roofline Model with High-Level Synthesis Tools
TLDR
This paper proposes the combination of the high-level synthesis (HLS) tools and the roofline model, in order to construct a performance model for FPGAs which is able to visually condense all the helpful information for the designer.
Microbenchmarks for GPU Characteristics: The Occupancy Roofline and the Pipeline Model
TLDR
This paper presents microbenchmarks in OpenCL to measure the most important performance characteristics of GPUs: the influence of independent instructions within a kernel and thread divergence and argues that these are themost important characteristics for understanding the performance and predicting performance.
An Extended Roofline Model with Communication-Awareness for Distributed-Memory HPC Systems
TLDR
A simple and intuitive graphical model, which extends the widely used Roofline performance model to include the communication cost in addition to the memory access time and the peak CPU performance, and enables performance evaluation on a third dimension of communication performance.
X-MAP A Performance Prediction Tool for Porting Algorithms and Applications to Accelerators
TLDR
An easy to use Graphical User Interface (GUI) Tool called X-MAP which is a performance prediction tool for porting algorithms and applications to architectures which encompasses a Machine Learning based inference model to predict the performance of an application on a number of well-known accelerators and at the same time predict the best architecture and programming language for the application.
A Robust Methodology for Performance Analysis on Hybrid Embedded Multicore Architectures
TLDR
The methodology is able to perform a complete computing architecture model, by using 3 different levels of tests, each one characterizing a specific situation representative of real applications, and aims to obtain performance prediction for different applications.
Cache-aware Roofline model: Upgrading the loft
TLDR
This paper analyzes the original Roofline model and proposes a novel approach to provide a more insightful performance modeling of modern architectures by introducing cache-awareness, thus significantly improving the guidelines for application optimization.
A modular and parameterisable classification of algorithms
TLDR
A new algorithm classification is introduced that uses a limited vocabulary and a well-defined grammar, creating a modular classification that is parameterisable and modularity and parameterisability make it possible to enable a very fine-grained and widely applicable classification.
Performance Prediction for Multi-Application Concurrency on GPUs
TLDR
This work proposes the first machine learning based predictor to predict the performance of an ensemble of applications on a GPU, and achieves an error of 9% across a suite of representative vision workloads for predicting the execution time.
Cross-System Runtime Prediction of Parallel Applications on Multi-Core Processors
TLDR
It is proposed that computation applications that are in need of this kind of treatment are sufficiently sophisticated and, especially in the case of commercial applications, are most likely black boxes and therefore avoid any need to analyze the authors' applications in any static manner and expressly rely on parallel runtimes of individual executions.
...
1
2
3
...

References

SHOWING 1-3 OF 3 REFERENCES
A modular and parameterisable classification of algorithms
TLDR
A new algorithm classification is introduced that uses a limited vocabulary and a well-defined grammar, creating a modular classification that is parameterisable and modularity and parameterisability make it possible to enable a very fine-grained and widely applicable classification.
Roofline: an insightful visual performance model for multicore architectures
TLDR
The Roofline model offers insight on how to improve the performance of software and hardware in the rapidly changing world of connected devices.
GPUs and the Future of Parallel Computing
TLDR
The capabilities of state-of-the art GPU-based high-throughput computing systems are discussed and the challenges to scaling single-chip parallel-computing systems are considered, highlighting high-impact areas that the computing research community can address.