Assessing the Performance of the SRR Loop Scheduler with Irregular Workloads
In High Performance Computing, the application’s workload must be evenly balanced among threads to deliver cutting-edge performance and scalability. In OpenMP, the load balancing problem arises when scheduling loop iterations to threads. In this context, several scheduling strategies have been proposed, but they do not take into account the input workload of the application and thus turn out to be suboptimal. In this work, we introduce a design methodology to propose, study and assess the performance of workload-aware loop scheduling strategies. In this methodology, a Genetic Algorithm is employed to explore the state space solution of the problem itself and to guide the design of new loop scheduling strategies, and a simulator is used to evaluate their performance. As a proof of concept, we show how the proposed methodology was used to propose and study a new workloadaware loop scheduling strategy named Smart Round-Robin (SRR). We implemented this strategy into GCC’s OpenMP runtime. We carry out several experiments to validate the simulator and to evaluate the performance of SRR. Our experimental results show that SRR may deliver up to 37.89% and 14.10% better performance than OpenMP’s Dynamic loop scheduling strategy in the simulated environment and in a real-world application kernel, respectively.