• Corpus ID: 236493712

Did the Model Change? Efficiently Assessing Machine Learning API Shifts

@article{Chen2021DidTM,
  title={Did the Model Change? Efficiently Assessing Machine Learning API Shifts},
  author={Lingjiao Chen and Tracy Cai and Matei A. Zaharia and James Y. Zou},
  journal={ArXiv},
  year={2021},
  volume={abs/2107.14203}
}
Machine learning (ML) prediction APIs are increasingly widely used. An ML API can change over time due to model updates or retraining. This presents a key challenge in the usage of the API because it’s often not clear to the user if and how the ML model has changed. Model shifts can affect downstream application performance and also create oversight issues (e.g. if consistency is desired). In this paper, we initiate a systematic investigation of ML API shifts. We first quantify the performance… 

Figures and Tables from this paper

FrugalMCT: Efficient Online ML API Selection for Multi-Label Classification Tasks
TLDR
FrugalMCT is a principled framework that adaptively selects the APIs to use for different data in an online fashion while respecting the user’s budget, and allows combining ML APIs’ predictions for any single data point, and selects the best combination based on an accuracy estimator.
Using sequential drift detection to test the API economy
TLDR
This work analyzes both histograms and call graph of API usage to determine if the usage patterns of the system has shifted and compares the application of nonparametric statistical and Bayesian sequential analysis to the problem.
Judging an Airbnb booking by its cover: how profile photos affect guest ratings
Purpose This research aims to examine whether the facial appearances and expressions of Airbnb host photos influence guest star ratings. Design/methodology/approach This research analyzed the

References

SHOWING 1-10 OF 30 REFERENCES
FrugalML: How to Use ML Prediction APIs More Accurately and Cheaply
TLDR
This work proposes FrugalML, a principled framework that jointly learns the strength and weakness of each API on different data, and performs an efficient optimization to automatically identify the best sequential strategy to adaptively use the available APIs within a budget constraint.
WILDS: A Benchmark of in-the-Wild Distribution Shifts
TLDR
WILDS is presented, a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, and is hoped to encourage the development of general-purpose methods that are anchored to real-world distribution shifts and that work well across different applications and problem settings.
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance of NLP models, while alternative approaches for evaluating models
Learning a Unified Classifier Incrementally via Rebalancing
TLDR
This work develops a new framework for incrementally learning a unified classifier, e.g. a classifier that treats both old and new classes uniformly, and incorporates three components, cosine normalization, less-forget constraint, and inter-class separation, to mitigate the adverse effects of the imbalance.
Model Assertions for Monitoring and Improving ML Models
TLDR
This work proposes a new abstraction, model assertions, that adapts the classical use of program assertions as a way to monitor and improve ML models and proposes an API for generating "consistency assertions" and weak labels for inputs where the consistency assertions fail.
Detecting and Correcting for Label Shift with Black Box Predictors
TLDR
Black Box Shift Estimation (BBSE) is proposed to estimate the test distribution of p(y) and it is proved BBSE works even when predictors are biased, inaccurate, or uncalibrated, so long as their confusion matrices are invertible.
Regularized Learning for Domain Adaptation under Label Shifts
We propose Regularized Learning under Label shifts (RLLS), a principled and a practical domain-adaptation algorithm to correct for shifts in the label distribution between a source and a target
Benchmarking Intent Detection for Task-Oriented Dialog Systems
TLDR
The results show that Watson Assistant's intent detection model outperforms other commercial solutions and is comparable to large pretrained language models while requiring only a fraction of computational resources and training data.
Optimized stratified sampling for approximate query processing
TLDR
This work treats the problem as an optimization problem where, given a workload of queries, a stratified random sample of the original data is selected such that the error in answering the workload queries using the sample is minimized.
...
...