Local Critic Training for Model-Parallel Learning of Deep Neural Networks
@article{Lee2018LocalCT, title={Local Critic Training for Model-Parallel Learning of Deep Neural Networks}, author={Hojung Lee and Cho-Jui Hsieh and Jong-Seok Lee}, journal={IEEE Transactions on Neural Networks and Learning Systems}, year={2018}, volume={33}, pages={4424-4436} }
In this article, we propose a novel model-parallel learning method, called local critic training, which trains neural networks using additional modules called local critic networks. The main network is divided into several layer groups, and each layer group is updated through error gradients estimated by the corresponding local critic network. We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and…
Figures and Tables from this paper
7 Citations
Trinity: Neural Network Adaptive Distributed Parallel Training Method Based on Reinforcement Learning
- Computer ScienceAlgorithms
- 2022
Trinity, an adaptive distributed parallel training method based on reinforcement learning, is presented to automate the search and tuning of parallel strategies and achieves up to 5% reductions in runtime, communication, and memory overhead, and up to a 40% increase in parallel strategy search speeds.
Penalty and Augmented Lagrangian Methods for Layer-parallel Training of Residual Networks
- Computer ScienceArXiv
- 2020
A layer-parallel training algorithm is proposed to overcome the scalability barrier caused by the serial nature of forward-backward propagation in deep residual learning and can provide speedup over the traditional layer-serial training methods.
Mapping DCNN to a Three Layer Modular Architecture: A Systematic Way for Obtaining Wider and More Effective Network
- Computer Science2020 IEEE 5th International Conference on Signal and Image Processing (ICSIP)
- 2020
We propose a modular Deep Convolutional Neural Network (DCNN) architecture which has the property of block-like design and re-usage of parameters by certain blocks. We leverage networks from the…
Efficient Neuromorphic Hardware Through Spiking Temporal Online Local Learning
- Computer ScienceIEEE Transactions on Very Large Scale Integration (VLSI) Systems
- 2022
This work introduces an effective hardware-friendly local training algorithm compatible with sparse temporal input coding and binary random classification weights, and explores spike sparsity in communication, parallelism in vector–matrix operations and process-level dataflow, and locality of training errors, which leads to low cost and fast training speed.
MS-NET: modular selective network
- Computer ScienceInternational Journal of Machine Learning and Cybernetics
- 2020
The modular nature and low parameter requirement of the network makes it very suitable in distributed and low computational environments and plays a vital role in the performance of thenetwork.
BackLink: Supervised Local Training with Backward Links
- Computer ScienceArXiv
- 2022
in simulation runtime in ResNet110 compared to the standard BP. Therefore, our method could create new opportunities for improving training algorithms towards better efficiency and biological…
Single-Layer Vision Transformers for More Accurate Early Exits with Less Overhead
- Computer ScienceNeural Networks
- 2022
References
SHOWING 1-10 OF 56 REFERENCES
Local Critic Training of Deep Neural Networks
- Computer Science2019 International Joint Conference on Neural Networks (IJCNN)
- 2019
A novel approach to train deep neural networks by unlocking the layer-wise dependency of backpropagation training, which is also useful from multi-model perspectives, including structural optimization of neural networks, computationally efficient progressive inference, and ensemble classification for performance improvement.
Training Neural Networks Using Features Replay
- Computer ScienceNeurIPS
- 2018
This work proposes a novel parallel-objective formulation for the objective function of the neural network, and introduces features replay algorithm and proves that it is guaranteed to converge to critical points for the non-convex problem under certain conditions.
Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks
- Computer ScienceICML
- 2018
The experiments show that layer-wise parallelism outperforms current parallelization approaches by increasing training speed, reducing communication costs, achieving better scalability to multiple GPUs, while maintaining the same network accuracy.
Neural Architecture Search with Reinforcement Learning
- Computer ScienceICLR
- 2017
This paper uses a recurrent network to generate the model descriptions of neural networks and trains this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set.
PruneTrain: fast neural network training by dynamic sparse model reconfiguration
- Computer ScienceSC
- 2019
This work proposes PruneTrain, a cost-efficient mechanism that gradually reduces the training cost during training by using a structured group-lasso regularization approach that drives the training optimization toward both high accuracy and small weight values.
Integrated Model, Batch, and Domain Parallelism in Training Neural Networks
- Computer ScienceSPAA
- 2018
We propose a new integrated method of exploiting model, batch and domain parallelism for the training of deep neural networks (DNNs) on large distributed-memory computers using minibatch stochastic…
Training Neural Networks Without Gradients: A Scalable ADMM Approach
- Computer ScienceICML
- 2016
This paper explores an unconventional training method that uses alternating direction methods and Bregman iteration to train networks without gradient descent steps, and exhibits strong scaling in the distributed setting, yielding linear speedups even when split over thousands of cores.
BranchyNet: Fast inference via early exiting from deep neural networks
- Computer Science2016 23rd International Conference on Pattern Recognition (ICPR)
- 2016
The BranchyNet architecture is presented, a novel deep network architecture that is augmented with additional side branch classifiers that can both improve accuracy and significantly reduce the inference time of the network.
Sobolev Training for Neural Networks
- Computer ScienceNIPS
- 2017
Sobolev Training for neural networks is introduced, which is a method for incorporating target derivatives in addition the to target values while training, and results in models with higher accuracy and stronger generalisation on three distinct domains.
Going deeper with convolutions
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition…