Optimizing Sparse Matrix Multiplications for Graph Neural Networks

  title={Optimizing Sparse Matrix Multiplications for Graph Neural Networks},
  author={S. Roger Qiu and You Liang and Zheng Wang},
Graph neural networks (GNNs) are emerging as a powerful technique for modeling graph structures. Due to the sparsity of realworld graph data, GNN performance is limited by extensive sparse matrix multiplication (SpMM) operations involved in computation. While the right sparse matrix storage format varies across input data, existing deep learning frameworks employ a single, static storage format, leaving much room for improvement. This paper investigates how the choice of sparse matrix storage… 

Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency Analysis

A taxonomy of parallelism in GNNs is designed, considering data and model parallelism, and different forms of pipelining, and the outcomes are synthesized in a set of insights that help to maximize GNN performance, and a comprehensive list of challenges and opportunities for further research into GNN computations.

Dynamic GPU Energy Optimization for Machine Learning Training Workloads

This paper presents GPOEO, an online GPU energy optimization framework for machine learning training workloads that dynamically determines the optimal energy configuration by employing novel techniques for online measurement, multi-objective prediction modeling, and search optimization.



Adaptive Filters and Aggregator Fusion for Efficient Graph Convolutions

This work presents a new type of GNN architecture that achieves state-of-the-art performance with lower memory consumption and latency, along with characteristics suited to accelerator implementation, and proposes aggregator fusion, a technique to enable GNNs to significantly boost their representational power.

SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication

A Sparse Matrix-vector multiplication Auto-Tuning system (SMAT) to bridge the gap between specific optimizations and general-purpose usage and automatically determines the optimal format and implementation for any input sparse matrix at runtime.

Graph Neural Networks: A Review of Methods and Applications

Automatic Selection of Sparse Matrix Representation on GPUs

This paper performs extensive characterization of pertinent sparsity features of around 700 sparse matrices and their SpMV performance with a number of sparse representations implemented in the NVIDIA CUSP and cuSPARSE libraries, and builds a decision model using machine learning to automatically select the best representation to use for a given sparse matrix on a given target platform.

Understanding and bridging the gaps in current GNN performance optimizations

An in-depth examination of the state-of-the-art GNN frameworks is provided, revealing five major gaps in the current frameworks in optimizing GNN performance, especially in handling the special complexities of GNN over traditional graph or DNN operations.

Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs

Deep Graph Library (DGL) enables arbitrary message handling and mutation operators, flexible propagation rules, and is framework agnostic so as to leverage high-performance tensor, autograd operations, and other feature extraction modules already available in existing frameworks.

Sparse Matrix Classification on Imbalanced Datasets Using Convolutional Neural Networks

This paper uses convolutional neural networks and proposes several solutions to mitigate the bias toward the majority classes when the data are not balanced, and introduces a new network called SpNet, which achieves better results than a standard network as AlexNet in terms of prediction accuracy even having a more simple architecture.

GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation

This paper presents a new Graph Neural Network type using feature-wise linear modulation (FiLM), which outperforms baseline methods on a regression task on molecular graphs and performs competitively on other tasks.

Optimizing Sparse Matrix–Vector Multiplications on an ARMv8-based Many-Core Architecture

This work develops a quantitative approach to characterize SpMV performance on a recent ARMv8-based many-core architecture, Phytium FT-2000 Plus (FTP), and proposes a machine learning based model that predicts the best storage format and parameters using input matrix features.

Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking

Graph2Gauss is proposed - an approach that can efficiently learn versatile node embeddings on large scale (attributed) graphs that show strong performance on tasks such as link prediction and node classification and the benefits of modeling uncertainty are demonstrated.