Non-Structured DNN Weight Pruning—Is It Beneficial in Any Platform?

  title={Non-Structured DNN Weight Pruning—Is It Beneficial in Any Platform?},
  author={Xiaolong Ma and Sheng Lin and Shaokai Ye and Zhezhi He and Linfeng Zhang and Geng Yuan and Sia Huat Tan and Z. Li and Deliang Fan and Xuehai Qian and X. Lin and Kaisheng Ma and Yanzhi Wang},
  journal={IEEE Transactions on Neural Networks and Learning Systems},
Large deep neural network (DNN) models pose the key challenge to energy efficiency due to the significantly higher energy consumption of off-chip DRAM accesses than arithmetic or SRAM operations. It motivates the intensive research on model compression with two main approaches. Weight pruning leverages the redundancy in the number of weights and can be performed in a non-structured, which has higher flexibility and pruning rate but incurs index accesses due to irregular weights, or structured… 

Lottery Aware Sparsity Hunting: Enabling Federated Learning on Resource-Limited Edge

This paper proposes federated lottery aware sparsity hunting (FLASH), a unified sparse learning framework to make the server win a lottery in terms of yielding a sparse sub-model, able to maintain classi⬁cation performance under highly resource-limited client settings.

A Flexible Yet Efficient DNN Pruning Approach for Crossbar-Based Processing-in-Memory Architectures

A projection-based shape voting algorithm to select suitable segment shapes to drive the weight pruning process is proposed and a low-overhead data path is introduced that can be easily integrated into any existing ReRAM-based DNN accelerator, achieving a high pruning ratio and a high execution efficiency.

Reduce Computing Complexity of Deep Neural Networks Through Weight Scaling

This paper introduces Scaling-Weight-based Convolution (SWC) technique to reduce the DNN model size and the complexity and number of arithmetic operations and a scaling and quantized network-acceleration processor (SQNAP) is proposed based on the SWC method to achieve high-speed and low-power with reduced memory accesses.

ESCALATE: Boosting the Efficiency of Sparse CNN Accelerator with Kernel Decomposition

The ever-growing parameter size and computation cost of Convolutional Neural Network (CNN) models hinder their deployment onto resource-constrained platforms. Network pruning techniques are proposed

Structured Pruning for Deep Convolutional Neural Networks: A survey

The state-of-the-art structured pruning techniques with respect to filter ranking methods, regularization methods, dynamic execution, neural architecture search, the lottery ticket hypothesis, and the applications of pruning are summarized and compared.

Automatic Attention Pruning: Improving and Automating Model Pruning using Attentions

Automatic Attention Pruning (AAP), an adaptive, attention-based, structured pruning approach to automatically generate small, accurate, and hardware-efficient models that meet user objectives, is presented.

Transforming Large-Size to Lightweight Deep Neural Networks for IoT Applications

A comprehensive overview of existing literature on compressing the DNN that reduces energy consumption, storage, and computation requirements for IoT applications and divides the existing approaches into five broad categories—network pruning, sparse representation, bits precision, knowledge distillation, and miscellaneous.

FFNeRV: Flow-Guided Frame-Wise Neural Representations for Videos

This work proposes FFNeRV, a novel method for incorporating information into frame-wise representations to exploit the temporal redundancy across the frames in videos inspired by the standard video codecs, and introduces a fully convolutional architecture, enabled by one-dimensional temporal grids, improving the continuity of spatial features.

Randomize and Match: Exploiting Irregular Sparsity for Energy Efficient Processing in SNNs

This work proposes MISS, a fraMework that takes full advantage of Irregular Sparsity in the SNN through synergistic hardware and software co-design, and designs a sparsity-stationary dataflow that keeps sparse weights stationary in the memory to avoid the decoding overhead.

Optimization of Convolutional Neural Networks for Constrained Devices through Binarization

This paper aims to provide a chronology of the events leading up to and including the publication of this book and some of the key events that led to its publication.




ImageNet classification with deep convolutional neural networks

A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM

This paper focuses on compressing and accelerating deep models with network weights represented by very small numbers of bits, referred to as extremely low bit neural network, and proposes to solve this problem using extragradient and iterative quantization algorithms that lead to considerably faster convergency compared to conventional optimization methods.

Learning both Weights and Connections for Efficient Neural Network

A method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections, and prunes redundant connections using a three-step method.

Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers

It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.

A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers

A systematic weight pruning framework of DNNs using the alternating direction method of multipliers (ADMM) is presented, which can reduce the total computation by five times compared with the prior work and achieves a fast convergence rate.

Discrimination-aware Channel Pruning for Deep Neural Networks

This work investigates a simple-yet-effective method, called discrimination-aware channel pruning, to choose those channels that really contribute to discriminative power and proposes a greedy algorithm to conduct channel selection and parameter optimization in an iterative way.

Sticker: A 0.41-62.1 TOPS/W 8Bit Neural Network Processor with Multi-Sparsity Compatible Convolution Arrays and Online Tuning Acceleration for Fully Connected Layers

This work, named STICKER, first systematically explores NN sparsity for inference and online tuning operations and develops multi-sparsity compatible Convolution compatible PE arrays that contain a multi-mode memory supporting different sparsity.

NISP: Pruning Networks Using Neuron Importance Score Propagation

  • Ruichi YuAng Li L. Davis
  • Computer Science
    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
The Neuron Importance Score Propagation (NISP) algorithm is proposed to propagate the importance scores of final responses to every neuron in the network and is evaluated on several datasets with multiple CNN models and demonstrated to achieve significant acceleration and compression with negligible accuracy loss.

SCNN: An accelerator for compressed-sparse convolutional neural networks

  • A. ParasharMinsoo Rhu W. Dally
  • Computer Science
    2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)
  • 2017
The Sparse CNN (SCNN) accelerator architecture is introduced, which improves performance and energy efficiency by exploiting thezero-valued weights that stem from network pruning during training and zero-valued activations that arise from the common ReLU operator.