• Corpus ID: 226246397

No more 996: Understanding Deep Learning Inference Serving with an Automatic Benchmarking System

  title={No more 996: Understanding Deep Learning Inference Serving with an Automatic Benchmarking System},
  author={Huaizheng Zhang and Yizheng Huang and Yonggang Wen and Jianxiong Yin and Kyle Guan},
Deep learning (DL) models have become core modules for many applications. However, deploying these models without careful performance benchmarking that considers both hardware and software's impact often leads to poor service and costly operational expenditure. To facilitate DL models' deployment, we implement an automatic and comprehensive benchmark system for DL developers. To accomplish benchmark-related tasks, the developers only need to prepare a configuration file consisting of a few… 

ModelPS: An Interactive and Collaborative Platform for Editing Pre-trained Models at Scale

A low-code solution to enable and empower collaborative DNN model editing and intelligent model serving, and a model genie engine in the backend to aid developers in customizing model editing configurations for given deployment requirements or constraints.

Evaluating the carbon footprint of NLP methods: a survey and analysis of existing tools

The scope of the measures provided and the use of six tools used to measure energy use and CO2 emissions of NLP methods are described and actionable recommendations to accurately measure the environmental impact of N LP experiments are proposed.

A Serverless Cloud-Fog Platform for DNN-Based Video Analytics with Incremental Learning

This paper designs and implements a holistic cloud-fog system referred to as VPaaS (Video-Platformas-a-Service) to execute inference related tasks and incorporates limited human feedback into the system to verify the results and adopt incremental machine learning to improve the system continuously.



DLHub: Model and Data Serving for Science

This work presents the Data and Learning Hub for science (DLHub), a multi-tenant system that provides both model repository and serving capabilities with a focus on science applications and shows that relative to other model serving systems, DLHub provides greater capabilities, comparable performance without memoization and batching, and significantly better performance when the latter two techniques can be employed.

DAWNBench : An End-to-End Deep Learning Benchmark and Competition

DAWNBench is introduced, a benchmark and competition focused on end-to-end training time to achieve a state-of-the-art accuracy level, as well as inference with that accuracy, and will provide a useful, reproducible means of evaluating the many tradeoffs in deep learning systems.

Clipper: A Low-Latency Online Prediction Serving System

Clipper is introduced, a general-purpose low-latency prediction serving system that introduces a modular architecture to simplify model deployment across frameworks and applications and improves prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks.

MLModelCI: An Automatic Cloud Platform for Efficient MLaaS

A key feature of MLModelCI is the implementation of a controller, which allows elastic evaluation which only utilizes idle workers while maintaining online service quality and thus free developers from manual and tedious work often associated with service deployment.

MLPerf Inference Benchmark

This paper presents the benchmarking method for evaluating ML inference systems, MLPerf Inference, and prescribes a set of rules and best practices to ensure comparability across systems with wildly differing architectures.

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

TVM is a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends and automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of code optimizations.

TFX: A TensorFlow-Based Production-Scale Machine Learning Platform

TensorFlow Extended (TFX) is presented, a TensorFlow-based general-purpose machine learning platform implemented at Google that was able to standardize the components, simplify the platform configuration, and reduce the time to production from the order of months to weeks, while providing platform stability that minimizes disruptions.

Fathom: reference workloads for modern deep learning methods

This paper assembles Fathom: a collection of eight archetypal deep learning workloads, ranging from the familiar deep convolutional neural network of Krizhevsky et al., to the more exotic memory networks from Facebook's AI research group, and focuses on understanding the fundamental performance characteristics of each model.

Accelerating the Machine Learning Lifecycle with MLflow

MLflow, an open source platform recently launched to streamline the machine learning lifecycle, covers three key challenges: experimentation, reproducibility, and model deployment, using generic APIs that work with any ML library, algorithm and programming language.

Carbontracker: Tracking and Predicting the Carbon Footprint of Training Deep Learning Models

This work proposes that energy and carbon footprint of model development and training is reported alongside performance metrics using tools like Carbontracker, and hopes this will promote responsible computing in ML and encourage research into energy-efficient deep neural networks.