Corpus ID: 220830741

Efficient Inference on GPUs for the Sparse Deep Neural Network Graph Challenge 2020

  title={Efficient Inference on GPUs for the Sparse Deep Neural Network Graph Challenge 2020},
  author={Mert Hidayetoglu and Carl Pearson and Vikram Sharma Mailthody and Eiman Ebrahimi and Jinjun Xiong and Rakesh Nagi and Wen-mei W. Hwu},
This paper presents GPU performance optimization and scaling results for the Sparse Deep Neural Network Challenge 2020. Demands for network quality have increased rapidly, pushing the size and thus the memory requirements of many neural networks beyond the capacity of available accelerators. Sparse deep neural networks (SpDNN) have shown promise for reigning in the memory footprint of large neural networks. However, there is room for improvement in implementing SpDNN operations on GPUs. This… Expand


Scalable Inference for Sparse Deep Neural Networks using Kokkos Kernels
This work bases their sparse network for DNNs, KK-SpDNN, on the sparse linear algebra kernels within the Kokkos Kernels library, and uses the sparse matrix-matrix multiplication in Kok Kos Kernels to reuse a highly optimized kernel. Expand
Performance of Training Sparse Deep Neural Networks on GPUs
A Fine-tune Structured Sparsity Learning (FSSL) method to regularize the structures of DNNs and accelerate the training of Dnns is proposed and results show that superior performance and efficiency than the Matlab example code. Expand
Accelerating DNN Inference with GraphBLAS and the GPU
This work addresses the 2019 Sparse Deep Neural Network Graph Challenge with an implementation of this challenge using the GraphBLAS programming model. We demonstrate our solution to this challengeExpand
Sparse Deep Neural Network Graph Challenge
The proposed Sparse Deep Neural Network (DNN) Challenge draws upon prior challenges from machine learning, high performance computing, and visual analytics to create a challenge that is reflective of emerging sparse AI systems. Expand
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
GPipe is introduced, a pipeline parallelism library that allows scaling any network that can be expressed as a sequence of layers by pipelining different sub-sequences of layers on separate accelerators, resulting in almost linear speedup when a model is partitioned across multiple accelerators. Expand
A GPU Implementation of the Sparse Deep Neural Network Graph Challenge
  • M. Bisson, M. Fatica
  • Computer Science
  • 2019 IEEE High Performance Extreme Computing Conference (HPEC)
  • 2019
A CUDA implementation of the latest addition to the Graph Challenge, the inference computation on a collection of large sparse deep neural networks using the managed memory API available in CUDA allows for simple and efficient distribution of these computations across a multiGPU NVIDIA DGX-2 server. Expand Sparse Deep Neural Network Performance
These submissions show that their state-of-the-art sparse DNN execution time, TDNN, is a strong function of the number of DNN operations performed, Nop, and underscores the need for new innovations to achieve high performance on very large sparseDNNs. Expand
RadiX-Net: Structured Sparse Matrices for Deep Neural Networks
  • Ryan A. Robinett, J. Kepner
  • Mathematics, Computer Science
  • 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
  • 2019
An algorithm is presented that deterministically generates RadiX-Nets: sparse DNN topologies that, as a whole, are much more diverse than X-Net topologies, while preserving X-Net's desired characteristics. Expand
Write Quick, Run Fast: Sparse Deep Neural Network in 20 Minutes of Development Time via SuiteSparse:GraphBLAS
SuiteSparse:GraphBLAS is a full implementation of the GraphBLAS standard, which provides a powerful and expressive framework for creating graph algorithms based on the elegant mathematics of sparseExpand
Update on Triangle Counting on GPU
This work presents an update to the triangle-counting portion of the subgraph isomorphism static graph challenge and improves the single-GPU kernel performance by introducing a work-stealing dynamic algorithm GPU kernel with persistent threads, which makes performance adaptive for large graphs without requiring a graph analysis phase. Expand