Fully Decoupled Neural Network Learning Using Delayed Gradients

@article{Zhuang2021FullyDN,
  title={Fully Decoupled Neural Network Learning Using Delayed Gradients},
  author={Huiping Zhuang and Yi Wang and Qinglai Liu and Zhiping Lin},
  journal={IEEE transactions on neural networks and learning systems},
  year={2021},
  volume={PP}
}
  • Huiping Zhuang, Yi Wang, +1 author Zhiping Lin
  • Published 2021
  • Medicine, Computer Science
  • IEEE transactions on neural networks and learning systems
Training neural networks with backpropagation (BP) requires a sequential passing of activations and gradients. This has been recognized as the lockings (i.e., the forward, backward, and update lockings) among modules (each module contains a stack of layers) inherited from the BP. In this brief, we propose a fully decoupled training scheme using delayed gradients (FDG) to break all these lockings. The FDG splits a neural network into multiple modules and trains them independently and… Expand
Accumulated Decoupled Learning with Gradient Staleness Mitigation for Convolutional Neural Networks
TLDR
This paper proposes an accumulated decoupled learning (ADL), which includes a module-wise gradient accumulation in order to mitigate the gradient staleness, and quantifies the staleness in such a way that its mitigation can be quantitatively visualized. Expand
Accumulated Decoupled Learning: Mitigating Gradient Staleness in Inter-Layer Model Parallelization
TLDR
An accumulated decoupled learning (ADL) which incorporates the gradient accumulation technique to mitigate the stale gradient effect is proposed and it is proved that the proposed method can converge to critical points, i.e., the gradients converge to 0, in spite of its asynchronous nature. Expand
Towards Quantized Model Parallelism for Graph-Augmented MLPs Based on Gradient-Free ADMM framework
TLDR
A parallel deep learning Alternating Direction Method of Multipliers (pdADMM) framework to achieve model parallelism: parameters in each layer of GA-MLP models can be updated in parallel. Expand
Distributed Hierarchical Sentence Embeddings for Unsupervised Extractive Text Summarization
  • Guanjie Huang, Hong Shen
  • 2021 6th International Conference on Big Data and Computing
  • 2021
Unsupervised text summarization is a promising approach that avoids human efforts in generating reference summaries, which is particularly important for large-scale datasets. To improve itsExpand
Approximate to Be Great: Communication Efficient and Privacy-Preserving Large-Scale Distributed Deep Learning in Internet of Things
TLDR
A communication efficient and privacy-preserving framework to enable different participants to distributively learn a model with a privacy protection guarantee is designed and a differentially private approximate mechanism for the distributed deep learning is developed. Expand
Pipelined Backpropagation at Scale: Training Large Models without Batches
TLDR
This work evaluates the use of small batch, fine-grained Pipelined Backpropagation, an asynchronous pipeline parallel training algorithm that has significant hardware advantages and introduces two methods, Spike Compensation and Linear Weight Prediction, that effectively mitigate the downsides caused by the asynchronicity of Pipeline Backpropaganda and outperform existing techniques in this setting. Expand
Toward Model Parallelism for Deep Neural Network based on Gradient-free ADMM Framework
TLDR
This paper proposes a novel parallel deep learning ADMM framework (pdADMM) to achieve layer parallelism: parameters in each layer of neural networks can be updated independently in parallel in parallel. Expand

References

SHOWING 1-10 OF 33 REFERENCES
Decoupled Greedy Learning of CNNs
TLDR
Decoupled Greedy Learning is considered, based on a greedy relaxation of the joint training objective, recently shown to be effective in the context of Convolutional Neural Networks (CNNs) on large-scale image classification, and it is shown that it can lead to better generalization than sequential greedy optimization. Expand
Training Neural Networks Using Features Replay
TLDR
This work proposes a novel parallel-objective formulation for the objective function of the neural network, and introduces features replay algorithm and proves that it is guaranteed to converge to critical points for the non-convex problem under certain conditions. Expand
Decoupled Parallel Backpropagation with Convergence Guarantee
TLDR
Decoupled parallel backpropagation algorithm for deep learning optimization with convergence guarantee is proposed and it is proved that the method guarantees convergence to critical points for the non-convex problem. Expand
Decoupled Neural Interfaces using Synthetic Gradients
TLDR
It is demonstrated that in addition to predicting gradients, the same framework can be used to predict inputs, resulting in models which are decoupled in both the forward and backwards pass -- amounting to independent networks which co-learn such that they can be composed into a single functioning corporation. Expand
Asynchronous Stochastic Gradient Descent with Delay Compensation
TLDR
The proposed algorithm is evaluated on CIFAR-10 and ImageNet datasets, and the experimental results demonstrate that DC-ASGD outperforms both synchronous SGD and asynchronous SGD, and nearly approaches the performance of sequential SGD. Expand
Training Neural Networks with Local Error Signals
TLDR
It is demonstrated, for the first time, that layer-wise training can approach the state-of-the-art on a variety of image datasets and a completely backprop free variant outperforms previously reported results among methods aiming for higher biological plausibility. Expand
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
TLDR
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Expand
Densely Connected Convolutional Networks
TLDR
The Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion, and has several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. Expand
Deep Supervised Learning Using Local Errors
TLDR
The proposed learning mechanism based on fixed, broad, and random tuning of each neuron to the classification categories outperforms the biologically-motivated feedback alignment learning technique on the CIFAR10 dataset, approaching the performance of standard backpropagation. Expand
Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures
TLDR
Results on scaling up biologically motivated models of deep learning on datasets which need deep networks with appropriate architectures to achieve good performance are presented and implementation details help establish baselines for biologically motivated deep learning schemes going forward. Expand
...
1
2
3
4
...