Learn More
In the field of HPC, the current hardware trend is to design multipro-cessor architectures that feature heterogeneous technologies such as specialized coprocessors (e.g., Cell/BE SPUs) or data-parallel accelerators (e.g., GPGPUs). Approaching the theoretical performance of these architectures is a complex issue. Indeed, substantial efforts have already been(More)
Large scale distributed systems like Grid gather several characteristics making them difficult to study only from theoretical models and simulators. Most of Grid deployed at large scale are production platforms making them inappropriate research tools because of their limited reconfig-uration, control and monitoring capabilities. In this paper , we present(More)
Thanks to recent advances in virtualization technologies, it is now possible to benefit from the flexibility brought by virtual machines at little cost in terms of CPU performance. However on HPC clusters some overheads remain which prevent widespread usage of virtualization. In this article, we tackle the issue of inter-VM MPI communications when VMs are(More)
In this paper we present PM 2 , a system environment which aims to support the execution of parallel applications on distributed architectures. In particular, we focus on parallel applications that solve irregular problems, e.g. problems the parallel decomposition of which is highly dynamic and not predictable. In the rst part we discuss the major drawbacks(More)
—The increasing numbers of cores, shared caches and memory nodes within machines introduces a complex hardware topology. High-performance computing applications now have to carefully adapt their placement and behavior according to the underlying hierarchy of hardware resources and their software affinities. We introduce the Hardware Locality (hwloc)(More)
In this chapter, we present a hybridization methodology for the development of linear algebra software for GPUs. The methodology is successfully used in MAGMA – a new generation of linear algebra libraries, similar in functionality to LAPACK, but extended for hybrid, GPU-based systems. Algorithms of interest are split into computational tasks. The tasks'(More)
—To fully tap into the potential of heterogeneous machines composed of multicore processors and multiple accelerators , simple offloading approaches in which the main trunk of the application runs on regular cores while only specific parts are offloaded on accelerators are not sufficient. The real challenge is to build systems where the application would(More)