A Comparative Measurement Study of Deep Learning as a Service Framework

  title={A Comparative Measurement Study of Deep Learning as a Service Framework},
  author={Yanzhao Wu and Ling Liu and Calton Pu and Wenqi Cao and Semih Sahin and Wenqi Wei and Qi Zhang},
  journal={IEEE Transactions on Services Computing},
Big data powered Deep Learning (DL) and its applications have blossomed in recent years, fueled by three technological trends: a large amount of data openly accessible, a growing number of DL frameworks, and a selection of affordable hardware devices. [] Key Method First, we show that for a specific DL framework, different configurations of its hyper-parameters may have a significant impact on performance.

DLBench: a comprehensive experimental evaluation of deep learning frameworks

This paper conducts an extensive experimental evaluation and analysis of six popular deep learning frameworks, namely, TensorFlow, MXNet, PyTorch, Theano, Chainer, and Keras, using three types of DL architectures Convolutional Neural Networks (CNN), Faster Region-based convolutional neural networks (Faster R-CNN), and Long Short Term Memory (LSTM).

Experimental Characterizations and Analysis of Deep Learning Frameworks

An empirical evaluation of four representative DL frameworks: TensorFlow, Caffe, Torch and Theano through a comparative analysis and characterization shows that the complex interactions among neural networks, hyper-parameters, their specific runtime implementations and datasets are latent factors for the uncertainty of runtime performance and accuracy.

An Overview of the Data-Loader Landscape: Comparative Performance Analysis

This paper is the first to distinguish the dataloader as a separate component in the Deep Learning (DL) workflow and to outline its structure and features.

Selecting and Composing Learning Rate Policies for Deep Neural Networks

  • Yanzhao WuLing Liu
  • Computer Science
    ACM Transactions on Intelligent Systems and Technology
  • 2022
Evaluated using popular benchmark datasets and different DNN models, this approach can effectively deliver high DNN test accuracy, outperform the existing recommended default LR policies, and reduce the DNN training time by 1.6 ∼ 6.7 × to meet a targeted model accuracy.

End-to-end Model Inference and Training on Gemmini

The design and implementation of a Gemmini backend for Microsoft’s ONNX Runtime engine is presented, which accelerates the primary computational kernels – matrix multiplications, convolutions, and pooling – while ensuring interoperability between the channel-layouts expected by Gemmini and the rest of ONNx Runtime.

Exploring Effects of Computational Parameter Changes to Image Recognition Systems

Image recognition tasks typically use deep learning and re- quire enormous processing power, thus relying on hardware accelerators like GPUs and FPGAs for fast, timely process- ing. Failure in

Acceleration of Neural Network Training on Hardware via HLS for an Edge-AI Device

A system-on-chip solution to accelerate neural network training workloads for an edge-AI device by proposing a high-level description of the training algorithm: gradient descent algorithm using high- level synthesis–Vivado HLS, to generate a computationally efficient accelerator.

RALaaS: Resource-Aware Learning-as-a-Service in Edge-Cloud Collaborative Smart Connected Communities

This paper proposes a framework to implement a distributed Learning-as-a-Service function by edge-cloud collaboratively integrating resources required by a learning task and proposes a deep reinforcement learning based solution to minimize the required learning resource and achieve better accuracy.

Technical and imaging factors influencing performance of deep learning systems for diabetic retinopathy

Various image-related factors play more significant roles than technical factors in determining the diagnostic performance, suggesting the importance of having robust training and testing datasets for DL training and deployment in the real-world settings.

Data science and Machine learning in the Clouds: A Perspective for the Future

The rise of paradigms like approximate computing, quantum computing and many more in recent times and their applicability in big data processing, data science, analytics, prediction and machine learning in the cloud environments are discussed.



Benchmarking Deep Learning Frameworks: Design Considerations, Metrics and Beyond

It is envisioned that unlike traditional performance-driven benchmarks, benchmarking deep learning software frameworks should take into account of both runtime and accuracy and their latent interaction with hyper-parameters and data-dependent configurations of DL frameworks.

DAWNBench : An End-to-End Deep Learning Benchmark and Competition

DAWNBench is introduced, a benchmark and competition focused on end-to-end training time to achieve a state-of-the-art accuracy level, as well as inference with that accuracy, and will provide a useful, reproducible means of evaluating the many tradeoffs in deep learning systems.

Benchmarking State-of-the-Art Deep Learning Software Tools

This paper presents an attempt to benchmark several state-of-the-art GPU-accelerated deep learning software tools, including Caffe, CNTK, TensorFlow, and Torch, and focuses on evaluating the running time performance of these tools with three popular types of neural networks on two representative CPU platforms and three representative GPU platforms.

TBD: Benchmarking and Analyzing Deep Neural Network Training

A new benchmark for DNN training is proposed, called TBD, that uses a representative set of DNN models that cover a wide range of machine learning applications and a new toolchain for performance analysis for these models is presented that combines the targeted usage of existing performance analysis tools, careful selection of new and existing metrics and methodologies to analyze the results, and utilization of domain specific characteristics ofDNN training.

An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures

This is the first study that dives deeper into the performance of DNN training in a holistic manner yet provides an in-depth look at layer-wise performance for different DNNs.

Performance analysis of CNN frameworks for GPUs

This paper analyzes the GPU performance characteristics of five popular deep learning frameworks: Caffe, CNTK, TensorFlow, Theano, and Torch in the perspective of a representative CNN model, AlexNet, and suggests possible optimization methods to increase the efficiency of CNN models built by the frameworks.

Rethinking the Inception Architecture for Computer Vision

This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.

Caffe: Convolutional Architecture for Fast Feature Embedding

Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

All you need is a good init

Performance is evaluated on GoogLeNet, CaffeNet, FitNets and Residual nets and the state-of-the-art, or very close to it, is achieved on the MNIST, CIFAR-10/100 and ImageNet datasets.

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

This work proposes a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit and derives a robust initialization method that particularly considers the rectifier nonlinearities.