# FusedMM: A Unified SDDMM-SpMM Kernel for Graph Embedding and Graph Neural Networks

@article{Rahman2021FusedMMAU, title={FusedMM: A Unified SDDMM-SpMM Kernel for Graph Embedding and Graph Neural Networks}, author={Md. Khaledur Rahman and Majedul Haque Sujon and Ariful Azad}, journal={2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)}, year={2021}, pages={256-266} }

We develop a fused matrix multiplication kernel that unifies sampled dense-dense matrix multiplication and sparsedense matrix multiplication under a single operation called FusedMM. By using user-defined functions, FusedMM can capture almost all computational patterns needed by popular graph embedding and GNN approaches.FusedMM is an order of magnitude faster than its equivalent kernels in Deep Graph Library. The superior performance of FusedMM comes from the low-level vectorized kernels, a…

## Figures and Tables from this paper

## 2 Citations

A Comprehensive Analytical Survey on Unsupervised and Semi-Supervised Graph Representation Learning Methods

- Computer ScienceArXiv
- 2021

This paper rigorously scrutinize the performance of embedding methods under various performance metrics and may serve as a comparative guide to help users select methods that are most suitable for their tasks.

Parallel Minimum Spanning Forest Computation using Sparse Matrix Kernels

- Computer ScienceArXiv
- 2021

This work develops the first formulation of the Awerbuch-Shiloach parallel minimum spanning forest (MSF) algorithm using linear algebra primitives and introduces a multilinear kernel that operates on an adjacency matrix and two vectors.

## References

SHOWING 1-10 OF 33 REFERENCES

GE-SpMM: General-purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks

- Computer ScienceSC
- 2020

GE-SpMM performs SpMM-like operation on sparse matrices represented in the most common Compressed Sparse Row (CSR) format, so it can be embedded in GNN frameworks with no preprocessing overheads and support general GNN algorithms.

FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling

- Computer Science, MathematicsICLR
- 2018

Enhanced with importance sampling, FastGCN not only is efficient for training but also generalizes well for inference, and is orders of magnitude more efficient while predictions remain comparably accurate.

How Powerful are Graph Neural Networks?

- Computer Science, MathematicsICLR
- 2019

This work characterize the discriminative power of popular GNN variants, such as Graph Convolutional Networks and GraphSAGE, and show that they cannot learn to distinguish certain simple graph structures, and develops a simple architecture that is provably the most expressive among the class of GNNs.

Fast Graph Representation Learning with PyTorch Geometric

- Computer Science, MathematicsArXiv
- 2019

PyTorch Geometric is introduced, a library for deep learning on irregularly structured input data such as graphs, point clouds and manifolds, built upon PyTorch, and a comprehensive comparative study of the implemented methods in homogeneous evaluation scenarios is performed.

GraphMat: High performance graph analytics made productive

- Computer ScienceProc. VLDB Endow.
- 2015

GraphMat is a single-node multicore graph framework written in C++ that achieves better multicore scalability than other frameworks and is 1.2X off native, hand-optimized code on a variety of graph algorithms.

Graph Attention Networks

- Mathematics, Computer ScienceICLR
- 2018

We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior…

Sampled Dense Matrix Multiplication for High-Performance Machine Learning

- Computer Science2018 IEEE 25th International Conference on High Performance Computing (HiPC)
- 2018

The development of cuSDDMM, a multi-node GPU-accelerated implementation for Sampled Dense-Dense Matrix Multiplication improves significantly over the best currently available GPU implementation of SDDMM (in the BIDMach Machine Learning library).

Scalable SIMD-Efficient Graph Processing on GPUs

- Computer Science2015 International Conference on Parallel Architecture and Compilation (PACT)
- 2015

Warp Segmentation is presented, a novel method that greatly enhances GPU device utilization by dynamically assigning appropriate number of SIMD threads to process a vertex with irregular-sized neighbors while employing compact CSR representation to maximize the graph size that can be kept inside the GPU global memory.

Attention-based Graph Neural Network for Semi-supervised Learning

- Computer Science, MathematicsArXiv
- 2018

A novel graph neural network is proposed that removes all the intermediate fully-connected layers, and replaces the propagation layers with attention mechanisms that respect the structure of the graph, and demonstrates that this approach outperforms competing methods on benchmark citation networks datasets.

Adaptive sparse tiling for sparse matrix multiplication

- Computer SciencePPoPP
- 2019

This paper devise an adaptive tiling strategy and apply it to enhance the performance of two primitives: SpMM (product of sparse matrix and dense matrix) and SDDMM (sampled dense-dense matrix multiplication).