FusedMM: A Unified SDDMM-SpMM Kernel for Graph Embedding and Graph Neural Networks

  title={FusedMM: A Unified SDDMM-SpMM Kernel for Graph Embedding and Graph Neural Networks},
  author={Md. Khaledur Rahman and Majedul Haque Sujon and Ariful Azad},
  journal={2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)},
We develop a fused matrix multiplication kernel that unifies sampled dense-dense matrix multiplication and sparsedense matrix multiplication under a single operation called FusedMM. By using user-defined functions, FusedMM can capture almost all computational patterns needed by popular graph embedding and GNN approaches.FusedMM is an order of magnitude faster than its equivalent kernels in Deep Graph Library. The superior performance of FusedMM comes from the low-level vectorized kernels, a… 
A Comprehensive Analytical Survey on Unsupervised and Semi-Supervised Graph Representation Learning Methods
  • Md. Khaledur Rahman, A. Azad
  • Computer Science
  • 2021
This paper rigorously scrutinize the performance of embedding methods under various performance metrics and may serve as a comparative guide to help users select methods that are most suitable for their tasks.
Parallel Minimum Spanning Forest Computation using Sparse Matrix Kernels
This work develops the first formulation of the Awerbuch-Shiloach parallel minimum spanning forest (MSF) algorithm using linear algebra primitives and introduces a multilinear kernel that operates on an adjacency matrix and two vectors.


GE-SpMM: General-purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks
GE-SpMM performs SpMM-like operation on sparse matrices represented in the most common Compressed Sparse Row (CSR) format, so it can be embedded in GNN frameworks with no preprocessing overheads and support general GNN algorithms.
FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling
Enhanced with importance sampling, FastGCN not only is efficient for training but also generalizes well for inference, and is orders of magnitude more efficient while predictions remain comparably accurate.
How Powerful are Graph Neural Networks?
This work characterize the discriminative power of popular GNN variants, such as Graph Convolutional Networks and GraphSAGE, and show that they cannot learn to distinguish certain simple graph structures, and develops a simple architecture that is provably the most expressive among the class of GNNs.
Fast Graph Representation Learning with PyTorch Geometric
PyTorch Geometric is introduced, a library for deep learning on irregularly structured input data such as graphs, point clouds and manifolds, built upon PyTorch, and a comprehensive comparative study of the implemented methods in homogeneous evaluation scenarios is performed.
GraphMat: High performance graph analytics made productive
GraphMat is a single-node multicore graph framework written in C++ that achieves better multicore scalability than other frameworks and is 1.2X off native, hand-optimized code on a variety of graph algorithms.
Graph Attention Networks
We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior
Sampled Dense Matrix Multiplication for High-Performance Machine Learning
The development of cuSDDMM, a multi-node GPU-accelerated implementation for Sampled Dense-Dense Matrix Multiplication improves significantly over the best currently available GPU implementation of SDDMM (in the BIDMach Machine Learning library).
Scalable SIMD-Efficient Graph Processing on GPUs
Warp Segmentation is presented, a novel method that greatly enhances GPU device utilization by dynamically assigning appropriate number of SIMD threads to process a vertex with irregular-sized neighbors while employing compact CSR representation to maximize the graph size that can be kept inside the GPU global memory.
Attention-based Graph Neural Network for Semi-supervised Learning
A novel graph neural network is proposed that removes all the intermediate fully-connected layers, and replaces the propagation layers with attention mechanisms that respect the structure of the graph, and demonstrates that this approach outperforms competing methods on benchmark citation networks datasets.
Adaptive sparse tiling for sparse matrix multiplication
This paper devise an adaptive tiling strategy and apply it to enhance the performance of two primitives: SpMM (product of sparse matrix and dense matrix) and SDDMM (sampled dense-dense matrix multiplication).