# Fully Decoupled Neural Network Learning Using Delayed Gradients

@article{Zhuang2021FullyDN, title={Fully Decoupled Neural Network Learning Using Delayed Gradients}, author={Huiping Zhuang and Yi Wang and Qinglai Liu and Zhiping Lin}, journal={IEEE transactions on neural networks and learning systems}, year={2021}, volume={PP} }

Training neural networks with backpropagation (BP) requires a sequential passing of activations and gradients. This has been recognized as the lockings (i.e., the forward, backward, and update lockings) among modules (each module contains a stack of layers) inherited from the BP. In this brief, we propose a fully decoupled training scheme using delayed gradients (FDG) to break all these lockings. The FDG splits a neural network into multiple modules and trains them independently and… Expand

#### Figures, Tables, and Topics from this paper

#### 7 Citations

Accumulated Decoupled Learning with Gradient Staleness Mitigation for Convolutional Neural Networks

- Computer Science
- ICML
- 2021

This paper proposes an accumulated decoupled learning (ADL), which includes a module-wise gradient accumulation in order to mitigate the gradient staleness, and quantifies the staleness in such a way that its mitigation can be quantitatively visualized. Expand

Accumulated Decoupled Learning: Mitigating Gradient Staleness in Inter-Layer Model Parallelization

- Computer Science
- ArXiv
- 2020

An accumulated decoupled learning (ADL) which incorporates the gradient accumulation technique to mitigate the stale gradient effect is proposed and it is proved that the proposed method can converge to critical points, i.e., the gradients converge to 0, in spite of its asynchronous nature. Expand

Towards Quantized Model Parallelism for Graph-Augmented MLPs Based on Gradient-Free ADMM framework

- Computer Science, Mathematics
- ArXiv
- 2021

A parallel deep learning Alternating Direction Method of Multipliers (pdADMM) framework to achieve model parallelism: parameters in each layer of GA-MLP models can be updated in parallel. Expand

Distributed Hierarchical Sentence Embeddings for Unsupervised Extractive Text Summarization

- 2021 6th International Conference on Big Data and Computing
- 2021

Unsupervised text summarization is a promising approach that avoids human efforts in generating reference summaries, which is particularly important for large-scale datasets. To improve its… Expand

Approximate to Be Great: Communication Efficient and Privacy-Preserving Large-Scale Distributed Deep Learning in Internet of Things

- Computer Science
- IEEE Internet of Things Journal
- 2020

A communication efficient and privacy-preserving framework to enable different participants to distributively learn a model with a privacy protection guarantee is designed and a differentially private approximate mechanism for the distributed deep learning is developed. Expand

Pipelined Backpropagation at Scale: Training Large Models without Batches

- Computer Science, Mathematics
- ArXiv
- 2020

This work evaluates the use of small batch, fine-grained Pipelined Backpropagation, an asynchronous pipeline parallel training algorithm that has significant hardware advantages and introduces two methods, Spike Compensation and Linear Weight Prediction, that effectively mitigate the downsides caused by the asynchronicity of Pipeline Backpropaganda and outperform existing techniques in this setting. Expand

Toward Model Parallelism for Deep Neural Network based on Gradient-free ADMM Framework

- Mathematics, Computer Science
- 2020 IEEE International Conference on Data Mining (ICDM)
- 2020

This paper proposes a novel parallel deep learning ADMM framework (pdADMM) to achieve layer parallelism: parameters in each layer of neural networks can be updated independently in parallel in parallel. Expand

#### References

SHOWING 1-10 OF 33 REFERENCES

Decoupled Greedy Learning of CNNs

- Computer Science, Mathematics
- ICML
- 2020

Decoupled Greedy Learning is considered, based on a greedy relaxation of the joint training objective, recently shown to be effective in the context of Convolutional Neural Networks (CNNs) on large-scale image classification, and it is shown that it can lead to better generalization than sequential greedy optimization. Expand

Training Neural Networks Using Features Replay

- Computer Science, Mathematics
- NeurIPS
- 2018

This work proposes a novel parallel-objective formulation for the objective function of the neural network, and introduces features replay algorithm and proves that it is guaranteed to converge to critical points for the non-convex problem under certain conditions. Expand

Decoupled Parallel Backpropagation with Convergence Guarantee

- Computer Science, Mathematics
- ICML
- 2018

Decoupled parallel backpropagation algorithm for deep learning optimization with convergence guarantee is proposed and it is proved that the method guarantees convergence to critical points for the non-convex problem. Expand

Decoupled Neural Interfaces using Synthetic Gradients

- Computer Science, Mathematics
- ICML
- 2017

It is demonstrated that in addition to predicting gradients, the same framework can be used to predict inputs, resulting in models which are decoupled in both the forward and backwards pass -- amounting to independent networks which co-learn such that they can be composed into a single functioning corporation. Expand

Asynchronous Stochastic Gradient Descent with Delay Compensation

- Computer Science
- ICML
- 2017

The proposed algorithm is evaluated on CIFAR-10 and ImageNet datasets, and the experimental results demonstrate that DC-ASGD outperforms both synchronous SGD and asynchronous SGD, and nearly approaches the performance of sequential SGD. Expand

Training Neural Networks with Local Error Signals

- Computer Science, Mathematics
- ICML
- 2019

It is demonstrated, for the first time, that layer-wise training can approach the state-of-the-art on a variety of image datasets and a completely backprop free variant outperforms previously reported results among methods aiming for higher biological plausibility. Expand

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

- Computer Science
- ICML
- 2015

Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Expand

Densely Connected Convolutional Networks

- Computer Science
- 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017

The Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion, and has several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. Expand

Deep Supervised Learning Using Local Errors

- Medicine, Computer Science
- Front. Neurosci.
- 2018

The proposed learning mechanism based on fixed, broad, and random tuning of each neuron to the classification categories outperforms the biologically-motivated feedback alignment learning technique on the CIFAR10 dataset, approaching the performance of standard backpropagation. Expand

Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures

- Computer Science, Mathematics
- NeurIPS
- 2018

Results on scaling up biologically motivated models of deep learning on datasets which need deep networks with appropriate architectures to achieve good performance are presented and implementation details help establish baselines for biologically motivated deep learning schemes going forward. Expand