CHAOS: a parallelization scheme for training convolutional neural networks on Intel Xeon Phi

@article{Viebke2017CHAOSAP,
  title={CHAOS: a parallelization scheme for training convolutional neural networks on Intel Xeon Phi},
  author={Andre Viebke and Suejb Memeti and Sabri Pllana and Ajith Abraham},
  journal={The Journal of Supercomputing},
  year={2017},
  volume={75},
  pages={197-227}
}
Deep learning is an important component of Big Data analytic tools and intelligent applications, such as self-driving cars, computer vision, speech recognition, or precision medicine. However, the training process is computationally intensive and often requires a large amount of time if performed sequentially. Modern parallel computing systems provide the capability to reduce the required training time of deep neural networks. In this paper, we present our parallelization scheme for training… Expand
DAPP: Accelerating Training of DNN
  • S. Sapna, N. S. Sreenivasalu, K. Paul
  • Computer Science
  • 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)
  • 2018
TLDR
This paper presents an acceleration technique DAPP Accelerating Training of DNN using Ping-Pong approach to reduce the training time using distributed local memory and adapt for multi-core architectures. Expand
Demystifying Parallel and Distributed Deep Learning
TLDR
The problem of parallelization in DNNs is described from a theoretical perspective, followed by approaches for its parallelization, and potential directions for parallelism in deep learning are extrapolated. Expand
Scaling Analysis of Specialized Tensor Processing Architectures for Deep Learning Models
TLDR
These results give the precise estimation of the higher performance (throughput) of TPAs as Google TPUv2 in comparison to GPU for the large number of computations under conditions of low overhead calculations and high utilization of TPU units by means of the large image and batch sizes. Expand
Iteration Time Prediction for CNN in Multi-GPU Platform: Modeling and Analysis
TLDR
This paper introduces a framework to analyze the training time for convolutional neural networks (CNNs) on multi-GPU platforms and decomposes the model and obtains accurate prediction results without long-term training or complex data collection. Expand
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis.
TLDR
The problem of parallelization in DNNs is described from a theoretical perspective, followed by approaches for its parallelization, and potential directions for parallelism in deep learning are extrapolated. Expand
A systematic literature review on hardware implementation of artificial intelligence algorithms
TLDR
This work presents a systematic literature review that focuses on exploring the available hardware accelerators for the AI and ML tools, using FPGAs, GPUs and ASICs to accelerate computationally intensive tasks. Expand
Supervised Deep Learning in High Energy Phenomenology: a Mini Review
TLDR
This note first describes various learning models and then recapitulates their applications to high energy phenomenological studies, including the machine learning scan in the analysis of new physics parameter space, the graph neural networks in the search of top-squark production and in the measurement of the top-Higgs coupling at the LHC. Expand
IoT-based Urban Noise Identification Using Machine Learning: Performance of SVM, KNN, Bagging, and Random Forest
TLDR
A machine learning based method for urban noise identification using an inexpensive IoT unit that uses Mel-frequency cepstral coefficients for audio feature extraction and supervised classification algorithms (that is, support vector machine, k-nearest neighbors, bootstrap aggregation, and random forest) for noise classification. Expand
Using Cognitive Computing for Learning Parallel Programming: An IBM Watson Solution
TLDR
This paper presents a meta-modelling framework that automates the very labor-intensive and therefore time-heavy and therefore expensive process of programming for parallel computing. Expand
A simulation study of a smart living IoT solution for remote elderly care
TLDR
A simulation study of a smart living IoT solution for elderly people living in their own houses that focuses on a carephone device that enables to establish a voice connection via IP with care givers or relatives. Expand
...
1
2
...

References

SHOWING 1-10 OF 84 REFERENCES
Training Large Scale Deep Neural Networks on the Intel Xeon Phi Many-Core Coprocessor
TLDR
A many- core algorithm which is based on a parallel method and is used in the Intel Xeon Phi many-core systems to speed up the unsupervised training process of Sparse Autoencoder and Restricted Boltzmann Machine and suggests that theIntel Xeon Phi can offer an efficient but more general-purposed way to parallelize the deep learning algorithm compared to GPU. Expand
Accelerating Large-Scale Convolutional Neural Networks with Parallel Graphics Multiprocessors
TLDR
This work has adapted the inherent multi-level parallelism of CNNs for Nvidia's CUDA GPU architecture to accelerate the training by two orders of magnitude, allowing to apply CNN architectures to pattern recognition tasks on datasets with high-resolution natural images. Expand
Accelerating pattern matching in neuromorphic text recognition system using Intel Xeon Phi coprocessor
TLDR
From a scalability standpoint on a High Performance Computing (HPC) platform it is shown that efficient workload partitioning and resource management can double the performance of this many-core architecture for neuromorphic applications. Expand
A snapshot of image pre-processing for convolutional neural networks: case study of MNIST
TLDR
This paper shows and analyzes the impact of different preprocessing techniques on the performance of three CNNs, LeNet, Network3 and DropConnect, together with their ensembles and demonstrates that data-preprocessing techniques, such as the combination of elastic deformation and rotation,together with ensembled have a high potential to further improve the state-of-the-art accuracy in MNIST classification. Expand
High Performance Convolutional Neural Networks for Document Processing
TLDR
Three novel approaches to speeding up CNNs are presented: a) unrolling convolution, b) using BLAS (basic linear algebra subroutines), and c) using GPUs (graphic processing units). Expand
Benchmarking State-of-the-Art Deep Learning Software Tools
TLDR
This paper presents an attempt to benchmark several state-of-the-art GPU-accelerated deep learning software tools, including Caffe, CNTK, TensorFlow, and Torch, and focuses on evaluating the running time performance of these tools with three popular types of neural networks on two representative CPU platforms and three representative GPU platforms. Expand
High-Performance Neural Networks for Visual Object Classification
We present a fast, fully parameterizable GPU implementation of Convolutional Neural Network variants. Our feature extractors are neither carefully designed nor pre-wired, but rather learned in aExpand
ImageNet classification with deep convolutional neural networks
TLDR
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective. Expand
Comparative Study of Deep Learning Software Frameworks
TLDR
A comparative study of five deep learning frameworks, namely Caffe, Neon, TensorFlow, Theano, and Torch, on three aspects: extensibility, hardware utilization, and speed finds that Theano and Torch are the most easily extensible frameworks. Expand
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
TLDR
The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields. Expand
...
1
2
3
4
5
...