Increasing the Performance of the Jacobi-Davidson Method by Blocking

@article{RhrigZllner2015IncreasingTP,
  title={Increasing the Performance of the Jacobi-Davidson Method by Blocking},
  author={Melven R{\"o}hrig-Z{\"o}llner and Jonas Thies and Moritz Kreutzer and Andreas Alvermann and Andreas Pieper and Achim Basermann and Georg Hager and Gerhard Wellein and Holger Fehske},
  journal={SIAM J. Sci. Comput.},
  year={2015},
  volume={37}
}
Block variants of the Jacobi--Davidson method for computing a few eigenpairs of a large sparse matrix are known to improve the robustness of the standard algorithm when it comes to computing multiple or clustered eigenvalues. In practice, however, they are typically avoided because the total number of matrix-vector operations increases. In this paper we present the implementation of a block Jacobi--Davidson solver. By detailed performance engineering and numerical experiments we demonstrate… 

Figures and Tables from this paper

Convergence of integration-based methods for the solution of standard and generalized Hermitian eigenvalue problems
TLDR
The progress of the Rayleigh-Ritz process and the achievable quality of the computed eigenpairs are investigated for the case that an upper bound for the normwise difference between the currently used subspace and the desired eigenspace is available.
Performance Engineering of the Kernel Polynomal Method on Large-Scale CPU-GPU Systems
TLDR
This work demonstrates the high optimization potential and feasibility of peta-scale heterogeneous CPU-GPU implementations of the Kernel Polynomial Method and uses the optimized node-level KPM with a hybrid-parallel framework to perform large- scale heterogeneous electronic structure calculations for novel topological materials on a pet scale-class Cray XC30 system.
Block Conjugate-Gradient Method With Multilevel Preconditioning and GPU Acceleration for FEM Problems in Electromagnetics
TLDR
It is demonstrated that blocking reduces the time to solution significantly and allows for better utilization of the computing power of GPUs, especially when the system matrix is complex valued.
Implementation and Performance Engineering of the Kaczmarz Method for Parallel Systems
TLDR
Hardware-efficiency and scalable shared memory parallelization strategies for the Kaczmarz method when used as a solver for sparse linear systems and a novel "block multicoloring" method, which leverages structural features of (partly) bandor hull-structured matrices are investigated.
PHIST: A Pipelined, Hybrid-Parallel Iterative Solver Toolkit
TLDR
This paper demonstrates how an existing implementation of a block Krylov-Schur method in the Trilinos package Anasazi can beneft from the performance engineering techniques used in PHIST.
Performance Engineering and Energy Efficiency of Building Blocks for Large, Sparse Eigenvalue Computations on Heterogeneous Supercomputers
TLDR
The use of PE is demonstrated in optimizing a density of states computation using the Kernel Polynomial Method, and it is shown that reduction of runtime and reduction of energy are literally the same goal in this case.
A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations
TLDR
A performance model is developed that allows us to correctly estimate the performance of the authors' SpMM kernel implementations, and cache bandwidth is identified as a potential performance bottleneck beyond DRAM.
A Scalable Matrix-Free Iterative Eigensolver for Studying Many-Body Localization
TLDR
The efficiency and effectiveness of the proposed algorithm is demonstrated by computing eigenstates in a massively parallel fashion, and analyzing their entanglement entropy to gain insight into the many-body localization (MBL) transition.
Performance of Block Jacobi-Davidson
TLDR
Standard restarted GMRES method (unpreconditioned) with single iteration to apply operator to preceding basis vector and perform local operations on basis vectors stored as blocks in a ring buffer.
...
...

References

SHOWING 1-10 OF 49 REFERENCES
Jacobi-Davidson Style QR and QZ Algorithms for the Reduction of Matrix Pencils
TLDR
Two algorithms, JDQZ for the generalized eigen problem and JDQR for the standard eigenproblem, that are based on the iterative construction of a (generalized) partial Schur form are presented, suitable for the efficient computation of several eigenvalues and the corresponding eigenvectors near a user-specified target value in the complex plane.
The Jacobi–Davidson method
TLDR
The Jacobi–Davidson method is reviewed, with the emphasis on recent developments that are important in practical use.
Nearly Optimal Preconditioned Methods for Hermitian Eigenproblems under Limited Memory. Part I: Seeking One Eigenvalue
TLDR
This research approaches the eigenproblem from the nonlinear perspective, which helps to develop two nearly optimal methods, one of which extends the recent Jacobi-Davidson conjugate gradient method to JDQMR, improving robustness and efficiency.
Improving the Performance of Dynamical Simulations Via Multiple Right-Hand Sides
TLDR
This paper shows how to redesign a dynamical simulation to exploit GSPMV in way that is not initially obvious because only one vector is available at a time, and measures a 30 percent speedup in performance in single-node, data parallel simulations.
Nearly Optimal Preconditioned Methods for Hermitian Eigenproblems Under Limited Memory. Part II: Seeking Many Eigenvalues
TLDR
It is argued that any eigenmethod with O(1) basis size, preconditioned or not, will be superseded asymptotically by Lanczos type methods that use O(nev) vectors in the basis, however, this may not happen until nev > O(1000).
CONVERGENCE ANALYSIS OF INEXACT RAYLEIGH QUOTIENT ITERATION∗
TLDR
It is shown that the general convergence result straightforwardly applies in this context and permits us to trace the convergence of the eigenpair in a function of the number of inner iterations performed at each step.
Towards Realistic Performance Bounds for Implicit CFD Codes
Locking issues for finding a large number of eigenvectors of Hermitian matrices
TLDR
The goal of this paper is to determine when locking is computationally preferable, and for which eigensolvers, and to address a subtle numerical, but not floating point, problem that arises with locking.
Communication-avoiding Krylov subspace methods
TLDR
This thesis aims to take s steps of a KSM for the same communication cost as 1 step, which would be optimal, and proposes techniques for developing communication-avoiding versions of nonsymmetric Lanczos iteration and Method of Conjugate Gradients for solving linear systems.
...
...