• Corpus ID: 83459629

On Challenges in Machine Learning Model Management

  title={On Challenges in Machine Learning Model Management},
  author={Sebastian Schelter and Felix Biessmann and Tim Januschowski and David Salinas and Stephan Seufert and Gyuri Szarvas},
  journal={IEEE Data Eng. Bull.},
The training, maintenance, deployment, monitoring, organization and documentation of machine learning (ML) models – in short model management – is a critical task in virtually all production ML use cases. Wrong model management decisions can lead to poor performance of a ML system and result in high maintenance cost. As both research on infrastructure as well as on algorithms is quickly evolving, there is a lack of understanding of challenges and best practices for ML model management… 

A Conceptual Vision Toward the Management of Machine Learning Models

A conceptual model for ML development is introduced and the vision toward a knowledge-based model management system oriented to model selection is introduced, which is a promising research direction for a more systematic and comprehensive approach for machine learning model selection.

A Monitoring System for Machine Learning Models in a Large-Scale Context

It is found that the monitoring system at ING supports relatively efficient model management in terms of checking model validation and evaluation and it supports the models regarding quality, the trust of the automated model creation, and usability.

Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities

This work analyzes the provenance graphs of 3000 production ML pipelines at Google, comprising over 450,000 models trained, spanning a period of over four months, in an effort to understand the complexity and challenges underlying production ML.

MLOps - Definitions, Tools and Challenges

An concentrated overview of the Machine Learning Operations (MLOps) area, identifying them not only as the answer for the incorporation of ML models in production but also as a possible tool for efficient, robust and accurate machine learning models.

MLife: a lite framework for machine learning lifecycle initialization

  • Cong YangWenfeng Wang John See
  • Computer Science
    2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)
  • 2021
This work introduces a simple yet flexible framework, MLife, for fast ML lifecycle initialization, built on the fact that data flow in MLife is in a closed loop driven by bad cases, especially those which impact ML model performance the most but also provide the most value for further ML model development.

On the Experiences of Adopting Automated Data Validation in an Industrial Machine Learning Project

The results show that adopting a data validation process and tool in ML projects is an effective approach to testing ML-enabled software systems.

Arangopipe, a tool for machine learning meta-data management

Arangopipe is an open-source tool that provides a data model that captures the essential components of any machine learning life cycle and an application programming interface that permits machine-learning engineers to record the details of the salient steps in building their machine learning models.

The Effects of Data Quality on Machine Learning Performance

This work explores empirically the relationship between six data quality dimensions and the performance of widely used machine learning algorithms covering the tasks of classification, regression, and clustering, with the goal of explaining their performance in terms of data quality.

Learning to Validate the Predictions of Black Box Classifiers on Unseen Data

A simple approach to automate the validation of deployed ML models by estimating the model's predictive performance on unseen, unlabeled serving data and finds that it reliably predicts the performance of black box models in the majority of cases, and outperforms several baselines even in the presence of unspecified data errors.

Machine Learning Application Development: Practitioners' Insights

The reported challenges and best practices of ML application development are synthesized into 17 findings to inform the research community about topics that need to be investigated to improve the engineering process and the quality of ML-based applications.



ModelDB: a system for machine learning model management

The ongoing work on ModelDB, a novel end-to-end system for the management of machine learning models, is described, which introduces a common layer of abstractions to represent models and pipelines, and the ModelDB frontend allows visual exploration and analyses of models via a web-based interface.

Data Management Challenges in Production Machine Learning

The goal of the tutorial is to bring forth data-management issues that arise in the context of machine learning pipelines deployed in production, draw connections to prior work in the database literature, and outline the open research questions that are not addressed by prior art.

Model Selection Management Systems: The Next Frontier of Advanced Analytics

A model enabling the development and maintenance of situation-aware applications in a declarative and therefore economical manner is developed, called KIDS - Knowledge Intensive Data-processing System.

Distributed Machine Learning-but at what COST ?

The results indicate that while being able to robustly scale with increasing data set size, current generation data flow systems are surprisingly inefficient at training machine learning models at need substantial resources to come within reach of the performance of single machine libraries.

Probabilistic Demand Forecasting at Scale

A platform built on large-scale, data-centric machine learning approaches, whose particular focus is demand forecasting in retail, that enables the training and application of probabilistic demand forecasting models, and provides convenient abstractions and support functionality for forecasting problems.

MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis

A system called MISTIQUE is proposed that can work with traditional ML pipelines as well as deep neural networks to efficiently capture, store, and query model intermediates for diagnosis and a range of optimizations to reduce storage footprint including quantization, summarization, and data de-duplication are proposed.

The Data Linter: Lightweight Automated Sanity Checking for ML Data Sets

The data linter is introduced, a new class of ML tool that automatically inspects ML data sets to identify potential issues in the data and suggest potentially useful feature transforms, for a given model type.

Forecasting at Scale

A practical approach to forecasting “at scale” that combines configurable models with analyst-in-the-loop performance analysis, and a modular regression model with interpretable parameters that can be intuitively adjusted by analysts with domain knowledge about the time series are described.

Automating Large-Scale Data Quality Verification

This work presents a system for automating the verification of data quality at scale, which meets the requirements of production use cases and provides a declarative API, which combines common quality constraints with user-defined validation code, and thereby enables 'unit tests' for data.

Hidden Technical Debt in Machine Learning Systems

It is found it is common to incur massive ongoing maintenance costs in real-world ML systems, and several ML-specific risk factors to account for in system design are explored.