Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm

Abstract

Scalability of Krylov subspace methods suffers from costly global synchronization steps that arise in dot-products and norm calculations on parallel machines. In this work, a modified Conjugate Gradient (CG) method is presented that removes the costly global synchronization steps from the standard CG algorithm by only performing a single non-blocking reduction per iteration. This global communication phase can be overlapped by the matrix-vector product, which typically only requires local communication. The resulting algorithm will be referred to as pipelined CG. An alternative pipelined method, mathematically equivalent to the Conjugate Residual method that makes different trade-offs with regard to scalability and serial runtime is also considered. These methods are compared to a recently proposed asynchronous CG algorithm proposed by B. Gropp. Extensive numerical experiments demonstrate the numerical stability of the methods. Moreover, it is shown that hiding the global synchronization step improves scalability on distributed memory machines using the message passing paradigm and leads to significant speedups compared to standard CG.

DOI: 10.1016/j.parco.2013.06.001

6 Figures and Tables

01020201520162017
Citations per Year

Citation Velocity: 14

Averaging 14 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.

Cite this paper

@article{Ghysels2014HidingGS, title={Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm}, author={Pieter Ghysels and Wim Vanroose}, journal={Parallel Computing}, year={2014}, volume={40}, pages={224-238} }