• Corpus ID: 231847094

Learning N: M Fine-grained Structured Sparse Neural Networks From Scratch

  title={Learning N: M Fine-grained Structured Sparse Neural Networks From Scratch},
  author={Aojun Zhou and Yukun Ma and Junnan Zhu and Jianbo Liu and Zhijie Zhang and Kun Yuan and Wenxiu Sun and Hongsheng Li},
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments. It can be generally categorized into unstructured fine-grained sparsity that zeroes out multiple individual weights distributed across the neural network, and structured coarse-grained sparsity which prunes blocks of sub-networks of a neural network. Fine-grained sparsity can achieve a high compression ratio but is not hardware friendly and hence receives… 

Figures and Tables from this paper

Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N: M Transposable Masks

A novel transposable fine-grained sparsity mask is suggested, which guarantees that both the weight matrix and its transpose follow the same sparsity pattern; thus, the matrix multiplication required for passing the error backward can also be accelerated.

Bi-directional Masks for Efficient N: M Sparse Training

A novel method of Bi-directional Masks (Bi-Mask) with its two central innovations in separate sparse masks in the two directions of forward and backward propagation to obtain training acceleration and an efficient weight row permutation method to maintain performance.

Towards Fully Sparse Training: Information Restoration with Spatial Similarity

Evaluation of accuracy and efficiency shows that the proposed fully sparse training (FST) can achieve 2× training acceleration with negligible accuracy degradation on challenging large-scale classification and detection tasks.

Learning Best Combination for Efficient N: M Sparsity

This paper shows that the N:M learning can be naturally characterized as a combinatorial problem which searches for the best combination candidate within a finite collection, and proves that the introduced scoring mechanism can well model the relative importance between combination subsets.

STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition

STEP is proposed, an Adam-aware recipe that learns N:M masks with two phases: first, STEP calculates a reliable variance estimate (precondition phase) and subsequently, the variance remains fixed and is used as a precondition to learn N: M masks (mask-learning phase).

Training Recipe for N: M Structured Sparsity with Decaying Pruning Mask

This work focuses on N:M sparsity and proposes two new decay-based pruning methods, namely “pruning mask decay’ and “sparse structure decay”, which consistently deliver state-of-the-art (SOTA) model accuracy, comparable to unstructured sparsity, on a Transformer-based model for a translation task.

NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM

This work proposes to formulate the NxM sparsity as a constrained optimization problem and use Alternating Direction Method of Multipliers (ADMM) to optimize the downstream tasks while taking the underlying hardware constraints into consideration, generating sparsified Transformer networks that achieve high accuracy while being able to effectively execute on newly released hardware.

DominoSearch: Find layer-wise fine-grained N: M sparse schemes from dense neural networks

A novel technique – DominoSearch to mixed N:M sparsity schemes from pre-trained dense deep neural networks to achieve higher accuracy than the uniform-sparsity scheme with equivalent complexity constraints (e.g. model size or FLOPs).

ClassPruning: Speed Up Image Restoration Networks by Dynamic N: M Pruning

A new solution pipeline dubbed ClassPruning is proposed that utilizes networks with different capabilities to process images with varying restoration difficulties and can help existing methods save approximately 40% FLOPs while maintaining performance.

DRESS: Dynamic REal-time Sparse Subnets

A novel training algorithm, Dynamic REal-time Sparse Subnets (DRESS), which samples multiple sub-networks from the same backbone network through row-based unstructured sparsity, and jointly trains these sub-nets in parallel with weighted loss.



Learning Structured Sparsity in Deep Neural Networks

The results show that for CIFAR-10, regularization on layer depth can reduce 20 layers of a Deep Residual Network to 18 layers while improve the accuracy from 91.25% to 92.60%, which is still slightly higher than that of original ResNet with 32 layers.

Soft Threshold Weight Reparameterization for Learnable Sparsity

STR is a simple mechanism which learns effective sparsity budgets that contrast with popular heuristics that boosts the accuracy over existing results by up to 10% in the ultra sparse (99%) regime and can also be used to induce low-rank (structured sparsity) in RNNs.

The State of Sparsity in Deep Neural Networks

It is shown that unstructured sparse architectures learned through pruning cannot be trained from scratch to the same test set performance as a model trained with joint sparsification and optimization, and the need for large-scale benchmarks in the field of model compression is highlighted.

Sparse Networks from Scratch: Faster Training without Losing Performance

This work develops sparse momentum, an algorithm which uses exponentially smoothed gradients (momentum) to identify layers and weights which reduce the error efficiently and shows that the benefits of momentum redistribution and growth increase with the depth and size of the network.

Topological Insights in Sparse Neural Networks

This work proposes Neural Network Sparse Topology Distance (NNSTD) and demonstrates that sparse neural networks can outperform over-parameterized models in terms of performance, even without any further structure optimization, and demonstrates the utility of using graph theory to analyze them.

Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization

This work suggests that exploring structural degrees of freedom during training is more effective than adding extra parameters to the network, and outperforms previous static and dynamic reparameterization methods, yielding the best accuracy for a fixed parameter budget.

Rigging the Lottery: Making All Tickets Winners

This paper introduces a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-to-sparse training methods.

PCNN: Pattern-based Fine-Grained Regular Pruning Towards Optimizing CNN Accelerators

A novel index format called Sparsity Pattern Mask (SPM) is presented to encode the sparsity in PCNN, a fine-grained regular 1D pruning method that achieves the compression rate up to 8.4× with only 0.2% accuracy loss.

Pruning Filters for Efficient ConvNets

This work presents an acceleration method for CNNs, where it is shown that even simple filter pruning techniques can reduce inference costs for VGG-16 and ResNet-110 by up to 38% on CIFAR10 while regaining close to the original accuracy by retraining the networks.

Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science

A method to design neural networks as sparse scale-free networks, which leads to a reduction in computational time required for training and inference, which has the potential to enable artificial neural networks to scale up beyond what is currently possible.