Learn More
SCALASCA is a performance toolset that has been specifically designed to analyze parallel application behavior on large-scale systems, but is also well-suited for small-and medium-scale HPC platforms. SCALASCA offers an incremen-tal performance-analysis process that integrates runtime summaries with in-depth studies of concurrent behavior via event tracing,(More)
Performance analysis of applications on supercomputers require scalable tools. The Periscope environment applies a distributed automatic online analysis and thus scales to thousands of processors. This article gives an overview of the Periscope system, from the performance property specification , via the search process, to the integration with two(More)
Analyzing the scalability behavior and the overheads of Open-MP applications is an important step in the development process of scientific software. Unfortunately, few tools are available that allow an exact quantification of OpenMP related overheads and scalability characteristics. We present a methodology in which we define four overhead categories that(More)
As supercomputers are being built from an ever increasing number of processing elements, the effort required to achieve a substantial fraction of the system peak performance is continuously growing. Tools are needed that give developers and computing center staff holistic indicators about the resource consumption of applications and potential performance(More)
Tasking in OpenMP 3.0 allows irregular parallelism to be expressed much more easily and it is expected to be a major step towards the widespread adoption of OpenMP for multicore programming. We discuss the issues encountered in providing monitoring support for tasking in an existing OpenMP profiling tool with respect to instrumentation, measurement, and(More)
Performance analysis for terascale computing requires a combination of new concepts including distribution, on-line processing and automation. As a foundation for tools realizing these concepts, we present a distributed monitoring approach for clustered SMP architectures that tries to minimize the perturbation of the target application while retaining(More)
In the last two decades supercomputers have sustained a remarkable growth in performance that even out-performed the predictions of Moore's law, primarily due to increased levels of parallelism [19]. As industry and academia try to come up with viable approaches for exas-cale systems, attention turns to energy efficiency as the primary design consideration.(More)
We present a detailed investigation of the scalability characteristics of the SPEC OpenMP benchmarks on large-scale shared memory multiprocessor machines. Our study is based on a tool that quantifies four well-defined overhead classes that can limit scalability – for each parallel region separately and for the application as a whole.
Profiling is often the method of choice for performance analysis of parallel applications due to its low overhead and easily compre-hensible results. However, a disadvantage of profiling is the loss of temporal information that makes it impossible to causally relate performance phenomena to events that happened prior or later during execution. We(More)