Performance of Low Synchronization Orthogonalization Methods in Anderson Accelerated Fixed Point Solvers

  title={Performance of Low Synchronization Orthogonalization Methods in Anderson Accelerated Fixed Point Solvers},
  author={Shelby Lockhart and David J. Gardner and Carol S. Woodward and Stephen J. Thomas and Luke N. Olson},
Anderson Acceleration (AA) is a method to accelerate the convergence of fixed point iterations for nonlinear, algebraic systems of equations. Due to the requirement of solving a least squares problem at each iteration and a reliance on modified Gram-Schmidt for updating the iteration space, AA requires extra costly synchronization steps for global reductions. Moreover, the number of reductions in each iteration depends on the size of the iteration space. In this work, we introduce three low… 

Figures from this paper

Post-Modern GMRES
. The GMRES algorithm of Saad and Schultz (1986) for nonsymmetric linear systems relies on the Arnoldi expansion of the Krylov basis. The algorithm computes the QR factorization of the matrix B = [ r


Considerations on the implementation and use of Anderson acceleration on distributed memory and GPU-based parallel computers
Performance results show that for sufficiently large problems a GPU implementation of Anderson acceleration can provide a significant performance increase over CPU versions due to the GPU’s higher memory bandwidth.
Anderson Acceleration for Fixed-Point Iterations
It is shown that, on linear problems, Anderson acceleration without truncation is “essentially equivalent” in a certain sense to the generalized minimal residual (GMRES) method and the Type 1 variant in the Fang-Saad Anderson family is similarly essentially equivalent to the Arnoldi (full orthogonalization) method.
On the Influence of the Orthogonalization Scheme on the Parallel Performance of GMRES
It is shown that the iterative classical Gram-Schmidt method overcomes its three competitors in speed and in parallel scalability while keeping robust numerical properties.
Enabling GPU Accelerated Computing in the SUNDIALS Time Integration Library
Solving linear least squares problems by Gram-Schmidt orthogonalization
If inner-products are accumulated in double precision then the errors in the computedx andr are less than the errors resulting from some simultaneous initial perturbation δA, δb such that the condition of the linear least squares problem is given.
Rounding error analysis of the classical Gram-Schmidt orthogonalization process
It is shown that, provided the initial set of vectors has numerical full rank, two iterations of the classical Gram-Schmidt algorithm are enough for ensuring the orthogonality of the computed vectors to be close to the unit roundoff level.
Low synchronization Gram–Schmidt and generalized minimal residual algorithms
The main contribution is to introduce a backward normalization lag into the compact WY representation, resulting in a 𝒪(ε)κ([r0,AVm]) stable Generalized Minimal Residual Method (GMRES) algorithm that requires only one global reduce per iteration.
The Effects of Loss of Orthogonality on Large Scale Numerical Computations
A nice theoretical indicator of loss of orthogonality and linear independence is discussed and it is shown how it leads to a related higher dimensional orthog onality that can be used to analyze and prove the effectiveness of such algorithms.
Reorthogonalization and stable algorithms for updating the Gram-Schmidt QR factorization
Numerically stable algorithms are given for updating the GramSchmidt QR factorization of an m X n matrix A (m > n) when A is modified by a matrix of rank one, or when a row or column is inserted or
Iterative Procedures for Nonlinear Integral Equations
A procedure is synthesized to offset some of the disadvantages of these t e c h n i q u e s in this context; however, the procedure is not restricted to this pt~rticular class of s y s t e m s of nonlinear equations.