The Effect of Network Noise on Large-Scale Collective Communications

Abstract

The effect of operating system (OS) noise on the performance of large-scale applications is a growing concern and ameliorating the influence of OS noise is a subject of active research. A related problem is that of network noise that arises from the shared use of the interconnection network by parallel processes of different allocations or other background activities. To characterize the effect of network noise on parallel applications, we conducted a series of experiments with a specially crafted benchmark and simulations. Experimental results show a decrease in the communication performance of a parallel reduction operation by a factor of 2 on 246 nodes on an InfiniBand fat-tree and by several orders of magnitude on a BlueGene/P torus. Simulations show how influence of network noise grows with the system size. Although network noise is not as well-studied as OS noise, our results clearly show that it is an important factor that must be considered when running and analyzing large-scale applications.

DOI: 10.1142/S0129626409000420

Extracted Key Phrases

18 Figures and Tables

Cite this paper

@article{Hoefler2009TheEO, title={The Effect of Network Noise on Large-Scale Collective Communications}, author={Torsten Hoefler and Timo Schneider and Andrew Lumsdaine}, journal={Parallel Processing Letters}, year={2009}, volume={19}, pages={573-593} }