Performance Implications of Big Data in Scalable Deep Learning: On the Importance of Bandwidth and Caching

  title={Performance Implications of Big Data in Scalable Deep Learning: On the Importance of Bandwidth and Caching},
  author={Miroslav Hodak and David Ellison and Peter Seidel and Ajay Dholakia},
  journal={2018 IEEE International Conference on Big Data (Big Data)},
Deep learning techniques have revolutionized many areas including computer vision and speech recognition. While such networks require tremendous amounts of data, the requirement for and connection to Big Data storage systems is often undervalued and not well understood. In this paper, we explore the relationship between Big Data storage, networking, and Deep Learning workloads to understand key factors for designing Big Data/Deep Learning integrated solutions. We find that storage and… 
2 Citations

Figures and Tables from this paper

Performance Evaluation and Benchmarking for the Era of Cloud(s): 11th TPC Technology Conference, TPCTC 2019, Los Angeles, CA, USA, August 26, 2019, Revised Selected Papers
An extension for TPC benchmarks addressing the requirements of big data processing in cloud environments is introduced as the Elasticity Test and evaluated under TPCx-BB to show how systems who fail to meet SLAs under concurrency due to queuing or degraded performance negatively affect the new metric.
Challenges in Distributed MLPerf


Towards Scalable Deep Learning via I/O Analysis and Optimization
  • S. Pumma, Min Si, Wu-chun Feng, P. Balaji
  • Computer Science
    2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)
  • 2017
A detailed analysis of the performance bottlenecks of Caffe on large supercomputing systems is presented and LMDBIO, an optimized I/O plugin for Caffe that takes into account the data access pattern in order to vastly improve I-O performance is presented.
Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective
The hardware and software infrastructure that supports machine learning at global scale is described, leveraging both GPU and CPU platforms for training and abundant CPU capacity for real-time inference.
Parallel I/O Optimizations for Scalable Deep Learning
LMDBIO-DM is proposed, an enhanced version of LMDBIo-LMM that optimizes the I/O access of Caffe in distributed-memory environments and can improve the overall execution time ofCaffe by more than 30-fold and 2-fold respectively.
Towards Evaluation of Tensorflow Performance in a Distributed Compute Environment
The results show that with the right choice of input parameters and appropriate hardware, GPU-equipped general-purpose compute clusters can provide comparable deep learning training performance to specialized machines designed for AI workloads.
Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines
Poseidon is proposed, a scalable system architecture for distributed inter-machine communication in existing DL frameworks that achieves state-of-art training speedup across multiple models and well-established datasets using a commodity GPU cluster of 8 nodes and converges to same objectives as a single machine.
Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices
Compared with the traditional method of offloading raw sensor data to be processed in the cloud, DDNN locally processes most sensor data on end devices while achieving high accuracy and is able to reduce the communication cost by a factor of over 20x.
Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability
  • J. Keuper, F. Pfreundt
  • Computer Science
    2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC)
  • 2016
The presented results show, that the current state of the art approach, using data-parallelized Stochastic Gradient Descent (SGD), is quickly turning into a vastly communication bound problem, leading to poor scalability of DNN training in most practical scenarios.
Parallel and Distributed Deep Learning
The goal of this report is to explore ways to parallelize/distribute deep learning in multi-core and distributed setting. We have analyzed (empirically) the speedup in training a CNN using
Efficient Processing of Deep Neural Networks: A Tutorial and Survey
Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver