Torsten Mehlan

Learn More
This paper introduces Netgauge, an extensible open-source framework for implementing network benchmarks. The structure of Net-gauge abstracts and explicitly separates communication patterns from communication modules. As a result of this separation of concerns, new benchmark types and new network protocols can be added independently to Netgauge. We describe(More)
Designing a 2048 core high performance cluster, including an appropriate parallel storage complex and a high speed network, under the pressure of limited budget (2.6 Mio Euro), performance, thermal and space limitations is really a challenging task. In this paper, we present our design decisions and their reasons , our experiences during the installation(More)
Large–scale parallel applications performing global synchronization may spend a significant amount of execution time waiting for the completion of a barrier operation. Consequently , numerous research works have focused on reducing the communication costs of synchronization primitives. However, so far there has been no exhaustive comparison of barrier(More)
Accurate models of parallel computation are often crucial to optimize parallel algorithms for their running time. In general the easier the model's use and the smaller the number of parameters and interdependen-cies among them, the more inaccuarcies are introduced by simplification. On the other hand a too complex model is unusable. We show that it is(More)
To leverage high speed interconnects like InfiniBand it is important to minimize the communication overhead. The most interfering overhead is the registration of communication memory. In this paper, we present our analysis of the memory registration process inside the Mellanox InfiniBand driver and possible ways out of this bottleneck. We evaluate and(More)
The performance of the barrier operation can be crucial for many parallel codes. Especially distributed shared memory systems have to synchronize frequently to ensure the proper ordering of memory accesses. The barrier operation is often performed on top of point-to-point messages and the best algorithm scales with O(log 2 P · L) in the LogP model. We(More)
This paper describes the basic concepts of our solution to improve the performance of Ethernet Communication on a Linux Cluster environment by introducing Reliable Low Latency Ethernet Sockets. We show that about 25% of the socket latency can be saved by using our simplified protocol. Especially, we put emphasis on demonstrating that this performance(More)
We present a micro benchmark suite to evaluate InfiniBand TM implementations with regards to single message performance and the addressing of many hosts. We use a 1:n communication pattern to assess the latency and bandwidth for all different combinations of InfiniBands TM transport services and functions. The results gathered in this study are used to(More)
Open MPI is a recent open source development project which combines features of different MPI implementations. These features include fault tolerance , multi network support, grid support and a component architecture which ensures extensibility. The TUC Hardware Barrier is a special purpose low la-tency barrier network based on commodity hardware. We show(More)
The Virtual Interface Architecture (VIA) was introduced to define a common set of features that are suitable to build high–speed networks. Today the interface of VIA serves as access point to a wide range of system area networks. M-VIA is a software that provides the VIA interface on top of several Ethernet cards. The overhead of TCP/IP protocols is avoided(More)