• Corpus ID: 251105080

Deep Partial Updating: Towards Communication Efficient Updating for On-device Inference

  title={Deep Partial Updating: Towards Communication Efficient Updating for On-device Inference},
  author={Zhongnan Qu and Cong Liu and Lothar Thiele},
. Emerging edge intelligence applications require the server to retrain and update deep neural networks deployed on remote edge nodes to leverage newly collected data samples. Unfortunately, it may be impossible in practice to continuously send fully updated weights to these edge nodes due to the highly constrained communication resource. In this paper, we propose the weight-wise deep partial updating paradigm, which smartly selects a small subset of weights to update in each server-to-edge… 
Measuring what Really Matters: Optimizing Neural Networks for TinyML
This work addresses the challenges of bringing Machine Learning to MCUs, where it focuses on the ubiquitous ARM Cortex-M architecture and proposes an implementation-aware design as a cost-effective method for verification and benchmarking.


Lifelong Learning with Dynamically Expandable Networks
The obtained network fine-tuned on all tasks obtained significantly better performance over the batch models, which shows that it can be used to estimate the optimal network structure even when all tasks are available in the first place.
Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds
This work designs a new algorithm for batch active learning with deep neural network models that samples groups of points that are disparate and high-magnitude when represented in a hallucinated gradient space, and shows that while other approaches sometimes succeed for particular batch sizes or architectures, BADGE consistently performs as well or better, making it a versatile option for practical active learning problems.
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
This paper finds 99.9% of the gradient exchange in distributed SGD is redundant, and proposes Deep Gradient Compression (DGC) to greatly reduce the communication bandwidth, which enables large-scale distributed training on inexpensive commodity 1Gbps Ethernet and facilitates distributedTraining on mobile.
The intriguing role of module criticality in the generalization of deep networks
This work forms how generalization relates to the module criticality, and shows that this measure is able to explain the superior generalization performance of some architectures over others, whereas earlier measures fail to do so.
Adaptive Group Sparse Regularization for Continual Learning
Throughout the extensive experimental results, it is shown that the AGS-CL uses much less additional memory space for storing the regularization parameters, and it significantly outperforms several state-of-the-art baselines on representative continual learning benchmarks for both supervised and reinforcement learning tasks.
Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding
This work introduces "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.
BinaryConnect: Training Deep Neural Networks with binary weights during propagations
BinaryConnect is introduced, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated, and near state-of-the-art results with BinaryConnect are obtained on the permutation-invariant MNIST, CIFAR-10 and SVHN.
Deep Residual Learning for Image Recognition
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Privacy-preserving deep learning
  • R. Shokri, Vitaly Shmatikov
  • Computer Science
    2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton)
  • 2015
This paper presents a practical system that enables multiple parties to jointly learn an accurate neural-network model for a given objective without sharing their input datasets, and exploits the fact that the optimization algorithms used in modern deep learning, namely, those based on stochastic gradient descent, can be parallelized and executed asynchronously.
Very Deep Convolutional Networks for Large-Scale Image Recognition
This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.