Learn More
Traditional parallel applications have exploited regular parallelism, based on parallel loops. Only a few applications exploit sections parallelism. With the release of the new OpenMP specification (3.0), this programming model supports tasking. Parallel tasks allow the exploitation of irregular parallelism, but there is a lack of benchmarks exploiting(More)
Clusters of GPUs are emerging as a new computational scenario. Programming them requires the use of hybrid models that increase the complexity of the applications, reducing the productivity of programmers. We present the implementation of OmpSs for clusters of GPUs, which supports asynchrony and heterogeneity for task parallelism. It is based on annotating(More)
OpenMP is still in the process of being defined and extended to broad the range of applications and paralleliza-tion strategies it can be used for. The proposal of OpenMP extensions may require the implementation of new features in the runtime system supporting the OpenMP parallel execution and modifications in an existing OpenMP compiler , either at the(More)
BlueGene/L is currently the world's fastest supercomputer. It consists of a large number of low power dual-processor compute nodes interconnected by high speed torus and collective networks, Because compute nodes do not have shared memory, MPI is the the natural programming model for this machine. The BlueGene/L MPI library is a port of MPICH2.In this paper(More)
Power modeling based on performance monitoring counters (PMCs) attracted the interest of researchers since it became a quick approach to understand and analyse power behavior on real systems. As a result, several power-aware policies use power models to guide their decisions and to trigger low-level mechanisms such as voltage and frequency scaling. Hence,(More)
In this paper we describe the design and implementation of a user-level thread package based on the nano-threads programming model, whose goal is to efficiently manage the application parallelism at user-level. Nano-thread applications work close to the operating system to quickly adapt to resource availability. The goal is to obtain an efficient parallel(More)
In this paper we describe an implementation overview of Nanos v4: an OpenMP Run Time Library (RTL) based on the nano-threads programming model. Our main goal is to discuss different aspects of the library development focusing on the implementation of a new feature introduced in the last OpenMP release: task support. We compare the performance of our(More)
Modern GPUs have evolved into fully programmable parallel stream multiprocessors. Due to the nature of the graphic workloads, computer vision algorithms are in good position to leverage the computing power of these devices. An interesting problem that greatly benefits from parallelism is face detection. This paper presents a highly optimized Haar-based face(More)