Bernard Tourancheau

Learn More
High speed networks are now providing incredible performances. Software evolution is slow and the old protocol stacks are no longer adequate for these kind of communication speed. When bandwidth increases, the latency should decrease as much in order to keep the system balance. With the current network technology, the main bottleneck is most of the time the(More)
Emerging many-core processors, like CUDA capable nVidia GPUs, are promising platforms for regular parallel algorithms such as the Lattice Boltzmann Method (LBM). Since global memory on graphic devices shows high latency and LBM is data intensive, memory access pattern is an important issue to achieve good performances. Whenever possible, global memory loads(More)
High speed networks are now providing incredible performance. Software evolution is slow and the old protocol stacks are no longer adequate for these kind of communication speeds. When bandwidth increases, the latency should decrease as much in order to keep the system balance. With the current network technology, the main bottleneck is most often the(More)
High speed networks are now providing incredible performances. Software evolution is slow and the old protocol stacks are no longer adequate for these kind of communication speed. When throughput increases, the latency should decrease as much in order to keep the system balance. With the current network technology, the main bottleneck is most of the time(More)
Block cyclic distribution seems to suit well for most linear algebra algorithms and this type of data distribution was chosen for the ScaLAPACK library as well as for the HPF language. But one has to choose a good compromise for the size of the blocks (to achieve a good computation and communication eeciency and a good load balancing). This choice heavily(More)
Implementing linear algebra kernels on distributed memory parallel computers raises the problem of data distribution of matrices and vectors among the processors. Block-cyclic distribution seems to suit well for most algorithms. But one has to choose a good compromise for the size of the blocks (to achieve a good computation and communication eeciency and a(More)
Device Interface Channel Interface NX Check_incoming "short", "eager", P4 TCP/IP Paragon SP/2 Generic ADI code, datatype mgmt, heterogeneity request queues mgmt " Protocol interface" SGI port. other ports shared-mem port MPL BIP MPI BIP "rendez-vous" Protocols Figure 1: The architecture of MPI-BIP implemented with one or several messages of the underlying(More)
In this paper we present a scalable protocol for conducting periodic probes of network performance in a way that minimizes collisions between separate probes. The goal of the protocol is to enable active performance monitoring of large-scale distributed computational systems and networks. We use the protocol to generate time series of measurement data that(More)
The lattice Boltzmann method (LBM) is an innovative and promising approach in computational fluid dynamics. From an algorithmic standpoint it reduces to a regular data parallel procedure and is therefore well-suited to high performance computations. Numerous works report efficient implementations of the LBM for the GPU, but very few mention multi-GPU(More)