# Combinatorial BLAS 2.0: Scaling Combinatorial Algorithms on Distributed-Memory Systems

@article{Azad2022CombinatorialB2, title={Combinatorial BLAS 2.0: Scaling Combinatorial Algorithms on Distributed-Memory Systems}, author={Ariful Azad and Oguz Selvitopi and Md Taufique Hussain and John R. Gilbert and Aydın Buluç}, journal={IEEE Transactions on Parallel and Distributed Systems}, year={2022}, volume={33}, pages={989-1001} }

Combinatorial algorithms such as those that arise in graph analysis, modeling of discrete systems, bioinformatics, and chemistry, are often hard to parallelize. The Combinatorial BLAS library implements key computational primitives for rapid development of combinatorial algorithms in distributed-memory systems. During the decade since its first introduction, the Combinatorial BLAS library has evolved and expanded significantly. This article details many of the key technical features of…

## Figures and Tables from this paper

## 8 Citations

### Parallel Algorithms for Adding a Collection of Sparse Matrices

- Computer Science2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
- 2022

A series of algorithms using tree merging, heap, sparse accumulator, hash table, and sliding hash table data structures that attain the theoretical lower bounds both on the computational and I/O complexities and perform the best in practice for SpKAdd.

### Fast Dynamic Updates and Dynamic SpGEMM on MPI-Distributed Graphs

- Computer ScienceArXiv
- 2022

This paper proposes a batch-dynamic algorithm for MPI-based parallel computing that reduces the communication volume of SpGEMM by exploiting that updates change far fewer matrix entries than there are non-zeros in the input operands.

### TileSpGEMM: a tiled algorithm for parallel sparse general matrix-matrix multiplication on GPUs

- Computer SciencePPoPP
- 2022

This paper proposes a tiled parallel SpGEMM algorithm that sparsifies the tiled method in dense general matrix-matrix multiplication, and saves each non-empty tile in a sparse form, and outperforms four state-of-the-art SpGemM methods.

### Distributed-Memory Parallel Contig Generation for De Novo Long-Read Genome Assembly

- Computer ScienceArXiv
- 2022

This work presents a novel distributed memory algorithm that, from a string graph representation of the genome and using sparse matrices, generates the contig set, i.e., overlapping sequences that form a map representing a region of a chromosome.

### Distributed-Memory Sparse Kernels for Machine Learning

- Computer Science2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- 2022

Sampled Dense Times Dense Matrix Multiplication (SDDMM) and Sparse Times Dense Matrix Multiplication (SpMM) appear in diverse settings, such as collaborative filtering, document clustering, and graph…

### GraphBLAS on the Edge: High Performance Streaming of Network Traffic

- Computer ScienceArXiv
- 2022

The performance of GraphBLAS on an Accolade Technologies edge network device is demonstrated on a near worse case trafﬁc scenario using a continuous stream of CAIDA Telescope darknet packets, demonstrating that anonymized hypersparse traf ﬂc matrices are readily computable on edge network devices with minimal compute resources and can be a viable data product for such devices.

### GraphBLAS on the Edge: Anonymized High Performance Streaming of Network Traffic

- Computer Science
- 2022

The performance of GraphBLAS on an Accolade Technologies edge network device is demonstrated on a near worse case trafﬁc scenario using a continuous stream of CAIDA Telescope darknet packets, demonstrating that anonymized hypersparse traf ﬂc matrices are readily computable on edge network devices with minimal compute resources and can be a viable data product for such devices.

## References

SHOWING 1-10 OF 47 REFERENCES

### The Combinatorial BLAS: design, implementation, and applications

- Computer ScienceInt. J. High Perform. Comput. Appl.
- 2011

The parallel Combinatorial BLAS is described, which consists of a small but powerful set of linear algebra primitives specifically targeting graph and data mining applications, and an extensible library interface and some guiding principles for future development are provided.

### Distributed-memory parallel algorithms for sparse times tall-skinny-dense matrix multiplication

- Computer ScienceICS
- 2021

The evaluations reveal that with the involvement of GPU accelerators, the best design choices for SpMM differ from the conventional algorithms that are known to perform well for dense matrix-matrix or sparse matrix-sparse matrix multiplies.

### The Reverse Cuthill-McKee Algorithm in Distributed-Memory

- Computer Science2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- 2017

This paper presents the first-ever distributed-memory implementation of the reverse Cuthill-McKee (RCM) algorithm for reducing the profile of a sparse matrix and achieves high performance by decomposing the problem into a small number of primitives and utilizing optimized implementations of these primitives.

### Optimizing High Performance Markov Clustering for Pre-Exascale Architectures

- Computer Science2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- 2020

This work systematically removes scalability and performance bottlenecks of HipMCL, and enables GPUs by performing the expensive expansion phase of the MCL algorithm on GPU, and proposes a CPU-GPU joint distributed SpGEMM algorithm called pipelined Sparse SUMMA and integrates a probabilistic memory requirement estimator that is fast and accurate.

### Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale

- Computer Science2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- 2021

This work developed a distributed symbolic step to understand the memory requirement and determine the number of batches beforehand, and integrated the multiplication in each batch with an existing communication avoiding techniques to reduce the communication overhead while multiplying matrices in a 3-D process grid.

### LACC: A Linear-Algebraic Algorithm for Finding Connected Components in Distributed Memory

- Computer Science2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- 2019

This paper presents a parallel connected-components algorithm that can run on distributed-memory computers and uses linear algebraic primitives and is based on a PRAM algorithm by Awerbuch and Shiloach, which outperforms previous algorithms by a significant margin.

### An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data

- Computer Science2014 IEEE 28th International Parallel and Distributed Processing Symposium
- 2014

This work presents a GPU SpGEMM algorithm that particularly focuses on load balancing, memory pre-allocation for the result matrix, and parallel insert operations of the nonzero entries that is experimentally found to be the fastest GPU merge approach.

### Distributed-Memory Algorithms for Maximum Cardinality Matching in Bipartite Graphs

- Computer Science2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- 2016

This work designs and implements scalable distributed-memory algorithms for maximum cardinality matching in bipartite graphs and employs bulk-synchronous matrix algebraic modules to implement graph searches, and Remote Memory Access (RMA) operations to map asynchronous light-weight graph accesses.

### On the representation and multiplication of hypersparse matrices

- Computer Science2008 IEEE International Symposium on Parallel and Distributed Processing
- 2008

This paper develops and analyzes two new algorithms that scale significantly better than existing kernels on the multiplication of sparse matrices (SpGEMM) and considers their algorithms first as the sequential kernel of a scalable parallel sparse matrix multiplication algorithm and second as part of a polyalgorithm that would execute different kernels depending on the sparsity of the input matrices.

### Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors

- Computer ScienceParallel Comput.
- 2019