• Corpus ID: 211171885

MLModelScope: A Distributed Platform for Model Evaluation and Benchmarking at Scale

  title={MLModelScope: A Distributed Platform for Model Evaluation and Benchmarking at Scale},
  author={Abdul Dakkak and Cheng Li and Jinjun Xiong and Wen-mei W. Hwu},
Machine Learning (ML) and Deep Learning (DL) innovations are being introduced at such a rapid pace that researchers are hard-pressed to analyze and study them. The complicated procedures for evaluating innovations, along with the lack of standard and efficient ways of specifying and provisioning ML/DL evaluation, is a major "pain point" for the community. This paper proposes MLModelScope, an open-source, framework/hardware agnostic, extensible and customizable design that enables repeatable… 

Figures and Tables from this paper

MLHarness: A Scalable Benchmarking System for MLCommons
  • Computer Science
  • 2021
The design of a container-based framework for reproducible performance analysis of ML workflows at scale is proposed and validated, showing empirically that the containerized approach is portable and allows arbitrarily low-level performance evaluation when run on two different, production-based HPC clusters with hundreds of GPUs.
Scenario-distilling AI Benchmarking
A scenario-distilling methodology to attack real-world application scenarios as a Directed Acyclic Graph-based model, and proposes the rules to distill it into the permutation of essential AI and non-AI tasks as a high-level scenario benchmark specification.
AIBench Scenario: Scenario-Distilling AI Benchmarking
  • Wanling Gao, Fei Tang, Zihan Jiang
  • Computer Science
    2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT)
  • 2021
This paper formalizes a real-world application scenario as a Directed Acyclic Graph-based model and proposes the rules to distill it into a permutation of essential AI and non-AI tasks, which is called a scenario benchmark, and implements an extensible, configurable, and flexible benchmark framework.


XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs
XSP is proposed — an across-stack profiling design that gives a holistic and hierarchical view of ML model execution that accurately captures the latencies at all levels of the HW/SW stack in spite of the profiling overhead.
DLHub: Model and Data Serving for Science
This work presents the Data and Learning Hub for science (DLHub), a multi-tenant system that provides both model repository and serving capabilities with a focus on science applications and shows that relative to other model serving systems, DLHub provides greater capabilities, comparable performance without memoization and batching, and significantly better performance when the latter two techniques can be employed.
SystemML: Declarative Machine Learning on Spark
This paper describes SystemML on Apache Spark, end to end, including insights into various optimizer and runtime techniques as well as performance characteristics.
cuDNN: Efficient Primitives for Deep Learning
A library similar in intent to BLAS, with optimized routines for deep learning workloads, that contains routines for GPUs, and similarly to the BLAS library, could be implemented for other platforms.
A Collective Knowledge workflow for collaborative research into multi-objective autotuning and machine learning techniques
This work has implemented customizable compiler autotuning, crowdsourced optimization of diverse workloads across Raspberry Pi 3 devices, reduced the execution time and code size by up to 40%, and applied machine learning to predict optimizations.
Katib: A Distributed General AutoML Platform on Kubernetes
Katib is a scalable Kubernetes-native general AutoML platform that can support a range of AutoML algorithms including both hyperparameter tuning and neural architecture search, and provides a universal platform for researchers as well as enterprises to try, compare and deploy their Auto ML algorithms, on any Kubernets platform.
Accelerating Deep Learning Frameworks with Micro-Batches
cuDNN is a low-level library that provides GPU kernels frequently used in deep learning. Specifically, cuDNN implements several equivalent convolution algorithms, whose performance and memory
A New Golden Age in Computer Architecture: Empowering the Machine-Learning Revolution
Motivation, suggestions, and warnings to computer architects on how to best contribute to the ML revolution are offered.
Rethinking the Inception Architecture for Computer Vision
This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.
Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
The design of Dapper is introduced, Google’s production distributed systems tracing infrastructure is described, and how its design goals of low overhead, application-level transparency, and ubiquitous deployment on a very large scale system were met are described.