Raymond Namyst

Learn More
In the field of HPC, the current hardware trend is to design multiprocessor architectures that feature heterogeneous technologies such as specialized coprocessors (e.g., Cell/BE SPUs) or data-parallel accelerators (e.g., GPGPUs). Approaching the theoretical performance of these architectures is a complex issue. Indeed, substantial efforts have already been(More)
Large scale distributed systems like Grids are difficult to study only from theoretical models and simulators. Most Grids deployed at large scale are production platforms that are inappropriate research tools because of their limited reconfiguration, control and monitoring capabilities. In this paper, we present Grid’5000, a 5000 CPUs nation-wide(More)
Large scale distributed systems like Grid gather several characteristics making them difficult to study only from theoretical models and simulators. Most of Grid deployed at large scale are production platforms making them inappropriate research tools because of their limited reconfiguration, control and monitoring capabilities. In this paper, we present(More)
The increasing numbers of cores, shared caches and memory nodes within machines introduces a complex hardware topology. High-performance computing applications now have to carefully adapt their placement and behavior according to the underlying hierarchy of hardware resources and their software affinities. We introduce the Hardware Locality (hwloc) software(More)
This paper introduces a version of MPICH handling efficiently different networks simultaneously. The core of the implementation relies on a device called ch mad which is based on a generic multiprotocol communication library called Madeleine. The performance achieved with tested networks such as Fast-Ethernet, Scalable Coherent Interface or Myrinet is very(More)
In this paper we present PM 2 , a system environment which aims to support the execution of parallel applications on distributed architectures. In particular, we focus on parallel applications that solve irregular problems, e.g. problems the parallel decomposition of which is highly dynamic and not predictable. In the rst part we discuss the major drawbacks(More)
Communication libraries have dramatically made progress over the fifteen years, pushed by the success of cluster architectures as the preferred platform for high performance distributed computing. However, many potential optimizations are left unexplored in the process of mapping application communication requests onto low level network commands. The(More)
To fully tap into the potential of heterogeneous machines composed of multicore processors and multiple accelerators, simple offloading approaches in which the main trunk of the application runs on regular cores while only specific parts are offloaded on accelerators are not sufficient. The real challenge is to build systems where the application would(More)
Due to their ever-growing success in the development of distributed applications, today's multithreaded environments have to be highly portable and efficient on a large variety of hardware. Most of these environments have an implementation built on top of standard communication interfaces such as PVM or MPI, which are widely available on existing(More)