Reducing Communication Costs for Sparse Matrix Multiplication within Algebraic Multigrid

@article{Ballard2016ReducingCC,
  title={Reducing Communication Costs for Sparse Matrix Multiplication within Algebraic Multigrid},
  author={Grey Ballard and Christopher M. Siefert and Jonathan J. Hu},
  journal={SIAM J. Sci. Comput.},
  year={2016},
  volume={38}
}
We consider the sequence of sparse matrix-matrix multiplications performed during the setup phase of algebraic multigrid. In particular, we show that the most commonly used parallel algorithm is often not the most communication-efficient one for all of the matrix-matrix multiplications involved. By using an alternative algorithm, we show that the communication costs are reduced (in theory and practice), and we demonstrate the performance benefit for both model (structured) and more realistic… 

Figures and Tables from this paper

Parallel memory-efficient all-at-once algorithms for the sparse matrix triple products in multigrid methods
TLDR
Two new algorithms are proposed that construct a coarse matrix with taking one pass through the input matrices without involving any auxiliary matrices for saving memory, and are perfectly scalable in both the compute time and the memory usage.
Hypergraph Partitioning for Parallel Sparse Matrix-Matrix Multiplication
TLDR
This paper characterize the communication cost of a sparse matrix-matrix multiplication algorithm in terms of the size of a cut of an associated hypergraph that encodes the computation for a given input nonzero structure.
Hypergraph Partitioning for Sparse Matrix-Matrix Multiplication
TLDR
It is shown that identifying a communication-optimal algorithm for particular input matrices is equivalent to solving a hypergraph partitioning problem, and hypergraphs are an accurate model for reasoning about the communication costs of SpGemM as well as a practical tool for exploring the SpGEMM algorithm design space.
αSetup-AMG: an adaptive-setup-based parallel AMG solver for sequence of sparse linear systems
TLDR
The main idea behind αSetup-AMG is the introduction of a setup condition in the coarsening process so that the setup is constructed as it needed instead of constructing in advance via an independent phase in the traditional procedure.
TileSpGEMM: a tiled algorithm for parallel sparse general matrix-matrix multiplication on GPUs
TLDR
This paper proposes a tiled parallel SpGEMM algorithm that sparsifies the tiled method in dense general matrix-matrix multiplication, and saves each non-empty tile in a sparse form, and outperforms four state-of-the-art SpGemM methods.
A Systematic Survey of General Sparse Matrix-Matrix Multiplication
TLDR
An experimentally comparative study of existing implementations on CPU and GPU of SpGEMM optimization from 1977 to 2019 is presented and highlights future research directions and how future studies can leverage the findings to encourage better design and implementation.
A Parallel Implementation of a Two-Level Overlapping Schwarz Method with Energy-Minimizing Coarse Space Based on Trilinos
We describe a new implementation of a two-level overlapping Schwarz preconditioner with energy-minimizing coarse space (GDSW: generalized Dryja--Smith--Widlund) and show numerical results for an
Technical Note: Improving the computational efficiency of sparse matrix multiplication in linear atmospheric inverse problems
TLDR
A hybrid-parallel sparse-sparse matrix multiplication approach that is more efficient by a third in terms of execution time and operation count relative to standard sparse matrix multiplication algorithms available in most libraries is presented.
High-Performance Sparse Matrix-Matrix Products on Intel KNL and Multicore Architectures
TLDR
A critical finding is that hash-table-based SpGEMM gets a significant performance boost if the nonzeros are not required to be sorted within each row of the output matrix.
...
1
2
...

References

SHOWING 1-10 OF 31 REFERENCES
Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments
TLDR
It is demonstrated that the parallel SpGEMM methods, which use two-dimensional block data distributions with serial hypersparse kernels, are indeed highly flexible, scalable, and memory-efficient in the general case.
Communication optimal parallel multiplication of sparse random matrices
TLDR
Two new parallel algorithms are obtained and it is proved that they match the expected communication cost lower bound, and hence they are optimal.
Hypergraph Partitioning for Parallel Sparse Matrix-Matrix Multiplication
TLDR
This paper characterize the communication cost of a sparse matrix-matrix multiplication algorithm in terms of the size of a cut of an associated hypergraph that encodes the computation for a given input nonzero structure.
Parallel Smoothed Aggregation Multigrid : Aggregation Strategies on Massively Parallel Machines
TLDR
This paper considers parallelization of the smoothe aggregation multigrid methods, and discusses three different parallel aggregation algorithms an illustrates the advantages an disadvantages of each variant in terms of parallelism an convergence.
A general parallel sparse-blocked matrix multiply for linear scaling SCF theory
ML 5.0 Smoothed Aggregation Users's Guide
TLDR
This document describes one specific algebraic multigrid approach: smoothed aggregation, a multilevel and domain decomposition method for symmetric and nonsymmetric systems of equations (like elliptic equations, or compressible and incompressible fluid dynamics problems).
Sparse matrix multiplication: The distributed block-compressed sparse row library
Sparse Matrix-Matrix Products Executed Through Coloring
TLDR
This paper proposes a new algorithm for computing sparse matrix-matrix products by exploiting their nonzero structure through the process of graph coloring and proves its viability for examples including multigrid methods used to solve boundary value problems as well as matrix products appearing in unstructured applications.
Optimizing Sparse Matrix—Matrix Multiplication for the GPU
TLDR
The implementation is fully general and the optimization strategy adaptively processes the SpGEMM workload row-wise to substantially improve performance by decreasing the work complexity and utilizing the memory hierarchy more effectively.
Simultaneous Input and Output Matrix Partitioning for Outer-Product-Parallel Sparse Matrix-Matrix Multiplication
TLDR
Three hypergraph models are proposed that achieve simultaneous partitioning of input and output matrices without any replication of input data for outer-product--parallel sparse matrix-matrix multiplication (SpGEMM).
...
1
2
3
4
...