• Corpus ID: 245877810

Leveraging Unlabeled Data to Predict Out-of-Distribution Performance

  title={Leveraging Unlabeled Data to Predict Out-of-Distribution Performance},
  author={S. Garg and Sivaraman Balakrishnan and Zachary Chase Lipton and Behnam Neyshabur and Hanie Sedghi},
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions that may cause performance drops. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model’s confidence, predicting accuracy as the fraction of unlabeled examples for which model… 

Estimating Model Performance under Domain Shifts with Class-Specific Confidence Scores

The introduced class-wise calibration within the framework of performance estimation for imbalanced datasets improves accuracy estimation by 18% in classification under natural domain shifts, and double the estimation accuracy on segmentation tasks, when compared with prior methods.

RankFeat: Rank-1 Feature Removal for Out-of-distribution Detection

RankFeat is proposed, a simple yet effective post hoc approach for OOD detection by removing the rank-1 matrix composed of the largest singular value and the associated singular vectors from the high-level feature.

Understanding new tasks through the lens of training data via exponential tilting

This work forms a distribution shift model based on the exponential tilt assumption and learns train data importance weights minimizing the KL divergence between labeled train and unlabeled target datasets, which can be used for downstream tasks such as target performance evaluation, fine-tuning, and model selection.

Performance Prediction Under Dataset Shift

Empirical validation on a benchmark of ten tabular datasets shows that models based upon state-of-the-art shift detection metrics are not expressive enough to generalize to unseen domains, while Error Predictors bring a consistent improvement in performance prediction under shift.

Predicting Out-of-Distribution Error with the Projection Norm

This work proposes a metric—Projection Norm—to predict a model’s performance on out-of-distribution (OOD) data without access to ground truth labels and finds that Projection Norm is the only approach that achieves non-trivial detection performance on adversarial examples.

A Unified Survey on Anomaly, Novelty, Open-Set, and Out-of-Distribution Detection: Solutions and Future Challenges

This survey aims to provide a cross-domain and comprehensive review of numerous eminent works in respective areas while identifying their commonalities and discusses and shed light on future lines of research, intending to bring these fields closer together.

Monitoring Model Deterioration with Explainable Uncertainty Estimation via Non-parametric Bootstrap

This work uses non-parametric bootstrapped uncertainty estimates and SHAP values to provide explainable uncertainty estimation as a technique that aims to monitor the deterioration of machine learning models in deployment environments, as well as determine the source of model deterioration when target labels are not available.

Domain Adaptation under Open Set Label Shift

We introduce the problem of domain adaptation under Open Set Label Shift (OSLS) where the label distribution can change arbitrarily and a new class may arrive during deployment, but the

Deconstructing Distributions: A Pointwise Framework of Learning

This work studies a point’s profile : the relationship between models’ average performance on the test distribution and their pointwise performance on this individual point, and finds that profiles can yield new insights into the structure of both models and data—in and out-of-distribution.

Predicting Out-of-Domain Generalization with Local Manifold Smoothness

A novel complexity measure based on the local manifold smoothness of a classifier’s output sensitivity to perturbations in the manifold neighborhood around a given test point that can be applied even in out-of-domain (OOD) settings where existing methods cannot.



Predicting with Confidence on Unseen Distributions

This investigation determines that common distributional distances, such as Frechet distance or Maximum Mean Discrepancy, fail to induce reliable estimates of performance under distribution shift, and finds that the proposed difference of confidences (DoC) approach yields successful estimates of a classifier’s performance over a variety of shifts and model architectures.

Mandoline: Model Evaluation under Distribution Shift

Empirical validation on NLP and vision tasks shows that Mandoline can estimate performance on the target distribution up to 3 × more accurately compared to standard baselines, and a density ratio estimation framework for the slices is described.

RATT: Leveraging Unlabeled Data to Guarantee Generalization

This work enables practitioners to certify generalization even when (labeled) holdout data is unavailable and provides insights into the relationship between random label noise and generalization.

Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift

This paper explores the problem of building ML systems that fail loudly, investigating methods for detecting dataset shift, identifying exemplars that most typify the shift, and quantifying shift malignancy, and demonstrates that domain-discriminating approaches tend to be helpful for characterizing shifts qualitatively and determining if they are harmful.

Estimating Accuracy from Unlabeled Data: A Probabilistic Logic Approach

An efficient method to estimate the accuracy of classifiers using only unlabeled data is proposed, based on the intuition that when classifiers agree, they are more likely to be correct, and when the classifiers make a prediction that violates the constraints, at least one classifier must be making an error.

Estimating Generalization under Distribution Shifts via Domain-Invariant Representations

This work uses a set of domain-invariant predictors as a proxy for the unknown, true target labels, and enables self-tuning of domain adaptation models, and accurately estimates the target error of given models under distribution shift.

Understanding the Failure Modes of Out-of-Distribution Generalization

This work identifies the fundamental factors that give rise to why models fail this way in easy-to-learn tasks where one would expect these models to succeed, and uncovers two complementary failure modes.

Regularized Learning for Domain Adaptation under Label Shifts

We propose Regularized Learning under Label shifts (RLLS), a principled and a practical domain-adaptation algorithm to correct for shifts in the label distribution between a source and a target

WILDS: A Benchmark of in-the-Wild Distribution Shifts

WILDS is presented, a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, and is hoped to encourage the development of general-purpose methods that are anchored to real-world distribution shifts and that work well across different applications and problem settings.

Predicting Unreliable Predictions by Shattering a Neural Network

This work proposes not only a theoretical framework to reason about subfunction error bounds but also a pragmatic way of approximately evaluating it, which it applies to predicting which samples the network will not successfully generalize to.