Sparse-Matrix Compression Primitives with OpenCL Framework to Support Halide

  title={Sparse-Matrix Compression Primitives with OpenCL Framework to Support Halide},
  author={Chao-Lin Lee and Chen-Ting Chao and Jenq-Kuen Lee and Chung-Wen Huang and Ming-Yu Hung},
  journal={Proceedings of the International Workshop on OpenCL},
Halide and OpenCL now play important roles for heterogeneous multi-core computing. OpenCL provides vendor-level support and Halide provides domain-specific support such as vision processing and AI model (TVM Halide IR). Halide also provides flexible scheduling for applications on target machines. OpenCL plays a supporting role for Halide environments. In this work, we investigate the research issues in supporting sparse computation with Halide and their corresponding OpenCL support. We present… 

Figures and Tables from this paper

Accelerating NNEF Framework on OpenCL Devices Using clDNN

A translator to convert the NNEF to clDNN, an open-source inference SDK, which is provided high-performance computation APIs on Intel hardware platforms and can speed up six times compared to the C implementation for the execution time of MobileNet_v1.

Support Convolution of CNN with Compression Sparse Matrix Multiplication Flow in TVM

A flow in TVM is proposed, which provides a sparse convolution flow with weight pruning, which maximize the sparsity by pruning certain weight and pertaining the model, and could achieve 11.42x speedup on average with ImageNet based models compared to the original flow.

Devise Sparse Compression Schedulers to Enhance FastText Methods

This paper adjusts the software architecture of FastText, and pre-process the pre-trained model offline, and introduces a new accelerating method with sparse matrix compression in Halide, which improves performance by compressing the matrix.



Efficient sparse-matrix multi-vector product on GPUs

An in-depth analysis is presented to contrast SpMV and SpMM, and a new sparse-matrix representation and computation approach suited to achieving high data-movement efficiency and effective GPU parallelization of SpMM is developed.

Decoupling algorithms from schedules for easy optimization of image processing pipelines

This work proposes a representation for feed-forward imaging pipelines that separates the algorithm from its schedule, enabling high-performance without sacrificing code clarity, and demonstrates the power of this representation by expressing a range of recent image processing applications in an embedded domain specific language called Halide and compiling them for ARM, x86, and GPUs.

Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines

A systematic model of the tradeoff space fundamental to stencil pipelines is presented, a schedule representation which describes concrete points in this space for each stage in an image processing pipeline, and an optimizing compiler for the Halide image processing language that synthesizes high performance implementations from a Halide algorithm and a schedule are presented.


A new programming language for image processing pipelines, called Halide, that separates the algorithm from its schedule, and is expressive enough to describe organizations that match or outperform state-of-the-art hand-written implementations of many computational photography and computer vision algorithms.