Learn More
SCALASCA is a performance toolset that has been specifically designed to analyze parallel application behavior on large-scale systems, but is also well-suited for small-and medium-scale HPC platforms. SCALASCA offers an incremen-tal performance-analysis process that integrates runtime summaries with in-depth studies of concurrent behavior via event tracing,(More)
Performance analysis of applications on supercomputers require scalable tools. The Periscope environment applies a distributed automatic online analysis and thus scales to thousands of processors. This article gives an overview of the Periscope system, from the performance property specification , via the search process, to the integration with two(More)
Analyzing the scalability behavior and the overheads of Open-MP applications is an important step in the development process of scientific software. Unfortunately, few tools are available that allow an exact quantification of OpenMP related overheads and scalability characteristics. We present a methodology in which we define four overhead categories that(More)
As supercomputers are being built from an ever increasing number of processing elements, the effort required to achieve a substantial fraction of the system peak performance is continuously growing. Tools are needed that give developers and computing center staff holistic indicators about the resource consumption of applications and potential performance(More)
Tasking in OpenMP 3.0 allows irregular parallelism to be expressed much more easily and it is expected to be a major step towards the widespread adoption of OpenMP for multicore programming. We discuss the issues encountered in providing monitoring support for tasking in an existing OpenMP profiling tool with respect to instrumentation, measurement, and(More)
Performance analysis for terascale computing requires a combination of new concepts including distribution, on-line processing and automation. As a foundation for tools realizing these concepts, we present a distributed monitoring approach for clustered SMP architectures that tries to minimize the perturbation of the target application while retaining(More)
We present a detailed investigation of the scalability characteristics of the SPEC OpenMP benchmarks on large-scale shared memory multiprocessor machines. Our study is based on a tool that quantifies four well-defined overhead classes that can limit scalability – for each parallel region separately and for the application as a whole.
DASH is a realization of the PGAS (partitioned global address space) model in the form of a C++ template library. Operator overloading is used to provide global-view PGAS semantics without the need for a custom PGAS (pre-)compiler. The DASH library is implemented on top of our runtime system DART, which provides an abstraction layer on top of existing(More)