• Corpus ID: 236493712

Did the Model Change? Efficiently Assessing Machine Learning API Shifts

@article{Chen2022DidTM,
  title={Did the Model Change? Efficiently Assessing Machine Learning API Shifts},
  author={Lingjiao Chen and Tracy Cai and Matei A. Zaharia and James Y. Zou},
  journal={ArXiv},
  year={2022},
  volume={abs/2107.14203}
}
Machine learning (ML) prediction APIs are increasingly widely used. An ML API can change over time due to model updates or retraining. This presents a key challenge in the usage of the API because it’s often not clear to the user if and how the ML model has changed. Model shifts can affect downstream application performance and also create oversight issues (e.g. if consistency is desired). In this paper, we initiate a systematic investigation of ML API shifts. We first quantify the performance… 

Figures and Tables from this paper

FrugalMCT: Efficient Online ML API Selection for Multi-Label Classification Tasks

TLDR
FrugalMCT is a principled framework that adaptively selects the APIs to use for different data in an online fashion while respecting the user’s budget, and allows combining ML APIs’ predictions for any single data point, and selects the best combination based on an accuracy estimator.

Using sequential drift detection to test the API economy

TLDR
This work analyzes both histograms and call graph of API usage to determine if the usage patterns of the system has shifted and compares the application of nonparametric statistical and Bayesian sequential analysis to the problem.

Judging an Airbnb booking by its cover: how profile photos affect guest ratings

Purpose This research aims to examine whether the facial appearances and expressions of Airbnb host photos influence guest star ratings. Design/methodology/approach This research analyzed the

Estimating and Explaining Model Performance When Both Covariates and Labels Shift

Deployed machine learning (ML) models often encounter new user data that differs from their training data. Therefore, estimating how well a given model might perform on the new data is an important

HAPI: A Large-scale Longitudinal Dataset of Commercial ML API Predictions

Commercial ML APIs offered by providers such as Google, Amazon and Microsoft have dramatically simplified ML adoption in many applications. Numerous companies and academics pay to use ML APIs for

References

SHOWING 1-10 OF 30 REFERENCES

FrugalML: How to Use ML Prediction APIs More Accurately and Cheaply

TLDR
This work proposes FrugalML, a principled framework that jointly learns the strength and weakness of each API on different data, and performs an efficient optimization to automatically identify the best sequential strategy to adaptively use the available APIs within a budget constraint.

WILDS: A Benchmark of in-the-Wild Distribution Shifts

TLDR
WILDS is presented, a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, and is hoped to encourage the development of general-purpose methods that are anchored to real-world distribution shifts and that work well across different applications and problem settings.

Learning a Unified Classifier Incrementally via Rebalancing

TLDR
This work develops a new framework for incrementally learning a unified classifier, e.g. a classifier that treats both old and new classes uniformly, and incorporates three components, cosine normalization, less-forget constraint, and inter-class separation, to mitigate the adverse effects of the imbalance.

Model Assertions for Monitoring and Improving ML Models

TLDR
This work proposes a new abstraction, model assertions, that adapts the classical use of program assertions as a way to monitor and improve ML models and proposes an API for generating "consistency assertions" and weak labels for inputs where the consistency assertions fail.

Detecting and Correcting for Label Shift with Black Box Predictors

TLDR
Black Box Shift Estimation (BBSE) is proposed to estimate the test distribution of p(y) and it is proved BBSE works even when predictors are biased, inaccurate, or uncalibrated, so long as their confusion matrices are invertible.

Optimized stratified sampling for approximate query processing

TLDR
This work treats the problem as an optimization problem where, given a workload of queries, a stratified random sample of the original data is selected such that the error in answering the workload queries using the sample is minimized.

Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation

TLDR
This paper proposes a direct importance estimation method that does not involve density estimation and is equipped with a natural cross validation procedure and hence tuning parameters such as the kernel width can be objectively optimized.

From ImageNet to Image Classification: Contextualizing Progress on Benchmarks

TLDR
This work uses human studies to investigate the consequences of employing a noisy data collection pipeline and study how specific design choices in the ImageNet creation process impact the fidelity of the resulting dataset---including the introduction of biases that state-of-the-art models exploit.

Learning Word Vectors for Sentiment Analysis

TLDR
This work presents a model that uses a mix of unsupervised and supervised techniques to learn word vectors capturing semantic term--document information as well as rich sentiment content, and finds it out-performs several previously introduced methods for sentiment classification.