• Corpus ID: 239998338

MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge

  title={MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge},
  author={Geng Yuan and Xiaolong Ma and Wei Niu and Zhengang Li and Zhenglun Kong and Ning Liu and Yifan Gong and Zheng Zhan and Chaoyang He and Qing Jin and Siyue Wang and Minghai Qin and Bin Ren and Yanzhi Wang and Sijia Liu and Xue Lin},
Recently, a new trend of exploring sparsity for accelerating neural network training has emerged, embracing the paradigm of training on the edge. This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices. The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S) that ensure superior accuracy at high sparsity ratios. Different from the existing works for sparse training… 

Figures and Tables from this paper

FedCV: A Federated Learning Framework for Diverse Computer Vision Tasks
This work proposes a federated learning library and benchmarking framework, named FedCV, to evaluate FL on the three most representative computer vision tasks: image classification, image segmentation, and object detection, and suggests that there are multiple challenges that deserve future exploration.
Federated Learning for Internet of Things: Applications, Challenges, and Opportunities
The opportunities and challenges of FL for IoT platforms, as well as how it can enable future IoT applications are discussed, include the possibility of orders of magnitude more endpoints brought by 5G/6G.
Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration
Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration YIFAN GONG∗ and GENG YUAN∗, Northeastern University, USA ZHENG ZHAN, Northeastern University, USA WEI NIU,


PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices
PCONV, a novel compiler-assisted DNN inference framework and PCONV models in real-time without accuracy compromise, which cannot be achieved in prior work are introduced.
PruneTrain: fast neural network training by dynamic sparse model reconfiguration
This work proposes PruneTrain, a cost-efficient mechanism that gradually reduces the training cost during training by using a structured group-lasso regularization approach that drives the training optimization toward both high accuracy and small weight values.
GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices based on Fine-Grained Structured Weight Sparsity
  • Wei Niu, Zhengang, +6 authors Bin Ren
  • Computer Science, Medicine
    IEEE transactions on pattern analysis and machine intelligence
  • 2021
A novel mobile inference acceleration framework GRIM is designed that is General to both convolutional neural networks (CNNs) and recurrent Neural networks (RNNs), and that achieves Real-time execution and high accuracy, leveraging fine-grained structured sparse model Inference and compiler optimizations for Mobiles.
Accelerating Sparse CNN Inference on GPUs with Performance-Aware Weight Pruning
The experimental results with five real-world pruned CNN models show that the techniques can significantly improve the layer-wise performance of sparse convolution operations as well as the end-to-end performance of CNN inference.
Learning Structured Sparsity in Deep Neural Networks
The results show that for CIFAR-10, regularization on layer depth can reduce 20 layers of a Deep Residual Network to 18 layers while improve the accuracy from 91.25% to 92.60%, which is still slightly higher than that of original ResNet with 32 layers.
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning
The proposed PatDNN is an end-to-end framework to efficiently execute DNN on mobile devices with the help of a novel model compression technique---pattern-based pruning based on an extended ADMM solution framework---and a set of thorough architecture-aware compiler/code generation-based optimizations, i.e., filter kernel reordering, compressed weight storage, register load redundancy elimination, and parameter auto-tuning.
Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization
This work suggests that exploring structural degrees of freedom during training is more effective than adding extra parameters to the network, and outperforms previous static and dynamic reparameterization methods, yielding the best accuracy for a fixed parameter budget.
Sparse Networks from Scratch: Faster Training without Losing Performance
This work develops sparse momentum, an algorithm which uses exponentially smoothed gradients (momentum) to identify layers and weights which reduce the error efficiently and shows that the benefits of momentum redistribution and growth increase with the depth and size of the network.
Compressing Neural Networks using the Variational Information Bottleneck
This paper focuses on pruning individual neurons, which can simultaneously trim model size, FLOPs, and run-time memory, and utilizes the information bottleneck principle instantiated via a tractable variational bound to improve upon the performance of existing compression algorithms.
StructADMM: Achieving Ultrahigh Efficiency in Structured Pruning for DNNs.
This work proposes a unified, systematic framework of structured weight pruning for DNNs that incorporates stochastic gradient descent with alternating direction method of multipliers (ADMM) and can be understood as a dynamic regularization method in which the regularization target is analytically updated in each iteration.