• Corpus ID: 235899139

Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks

  title={Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks},
  author={Andrey Malinin and Neil Band and German Chesnokov and Yarin Gal and Mark John Francis Gales and Alexey Noskov and Andrey Ploskonosov and Liudmila Prokhorenkova and Ivan Provilkov and Vatsal Raina and Vyas Raina and Mariya Shmatova and Panos Tigas and Boris Yangel},
There has been significant research done on developing methods for improving robustness to distributional shift and uncertainty estimation. In contrast, only limited work has examined developing standard datasets and benchmarks for assessing these approaches. Additionally, most work on uncertainty estimation and robustness has developed new techniques based on small-scale regression or image classification tasks. However, many tasks of practical interest have different modalities, such as… 

Shifts 2.0: Extending The Dataset of Real Distributional Shifts

This paper extends the Shifts Dataset with two datasets sourced from industrial, high-risk applications of high societal importance, considering the tasks of segmentation of white matter Multiple Sclerosis lesions in 3D magnetic resonance brain images and the estimation of power consumption in marine cargo vessels.

Evaluating Predictive Uncertainty and Robustness to Distributional Shift Using Real World Data

This paper proposes more insightful metrics for general regression tasks using the Shifts Weather Prediction Dataset and presents an evaluation of the baseline methods using these metrics.

Improving Baselines in the Wild

This study focuses on two datasets: iWildCam and FMoW, and shows that conducting separate cross-validation for each evaluation metric is crucial for both datasets.

On the Importance of Gradients for Detecting Distributional Shifts in the Wild

GradNorm is presented, a simple and effective approach for detecting OOD inputs by utilizing information extracted from the gradient space, which employs the vector norm of gradients, backpropagated from the KL divergence between the softmax output and a uniform probability distribution.

An Analysis of Distributional Shifts in Automated Driving Functions in Highway Scenarios

It is demonstrated that a safety critical driving function, e.g., a lane change maneuver prediction, trained on one dataset will not generalize as expected to the other dataset in the presence of these distributional shifts.

Understanding Gradual Domain Adaptation: Improved Analysis, Optimal Path and Beyond

This work analyzes gradual self-training under more general and relaxed assumptions, and proves a significantly improved generalization bound, which implies the existence of an optimal choice of T that minimizes the generalization error, and also naturally suggests an optimal way to construct the path of intermediate domains so as to minimize the accumulative path length T ∆ between the source and target.

Towards Clear Expectations for Uncertainty Estimation

The very rationale of why the authors quantify uncertainty is questioned and a standardized protocol for UQ evaluation based on metrics proven to be relevant for the ML practitioner is called for.

Uncertainty estimation for Cross-dataset performance in Trajectory prediction

This paper observes the performance of two of the latest state-of-the-art trajectory prediction methods across four different datasets and presents a novel method to estimate prediction uncertainty and shows how it could be used to achieve better performance across datasets.

More layers! End-to-end regression and uncertainty on tabular data with deep learning

An end-to-end algorithm for solving the problem of regression with uncertainty on tabular data, which is based on the combination of four ideas: 1) deep ensemble of self-normalizing neural networks, 2) regression as parameter estimation of the Gaussian target error distribution, 3) hierarchical multitask learning, and 4) simple data preprocessing.

DAFT: Distilling Adversarially Fine-tuned Models for Better OOD Generalization

This work proposes a new method – DAFT – based on the intuition that adversarially robust combination of a large number of rich features should provide OOD robustness, and demonstrates that it achieves improvements over the current state-of-the-art OOD generalization methods.



WILDS: A Benchmark of in-the-Wild Distribution Shifts

WILDS is presented, a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, and is hoped to encourage the development of general-purpose methods that are anchored to real-world distribution shifts and that work well across different applications and problem settings.

Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift

A large-scale benchmark of existing state-of-the-art methods on classification problems and the effect of dataset shift on accuracy and calibration is presented, finding that traditional post-hoc calibration does indeed fall short, as do several other previous methods.

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

It is found that using larger models and artificial data augmentations can improve robustness on real-world distribution shifts, contrary to claims in prior work.

Reverse KL-Divergence Training of Prior Networks: Improved Uncertainty and Adversarial Robustness

This paper investigates using Prior Networks to detect adversarial attacks and proposes a generalized form of adversarial training, and shows that the appropriate training criterion for Prior Networks is the reverse KL-divergence between Dirichlet distributions.

Predictive Uncertainty Estimation via Prior Networks

This work proposes a new framework for modeling predictive uncertainty called Prior Networks (PNs) which explicitly models distributional uncertainty by parameterizing a prior distribution over predictive distributions and evaluates PNs on the tasks of identifying out-of-distribution samples and detecting misclassification on the MNIST dataset, where they are found to outperform previous methods.

Natural Adversarial Examples

This work introduces two challenging datasets that reliably cause machine learning model performance to substantially degrade and curates an adversarial out-of-distribution detection dataset called IMAGENET-O, which is the first out- of-dist distribution detection dataset created for ImageNet models.

Revisiting Deep Learning Models for Tabular Data

An overview of the main families of DL architectures for tabular data is performed and the bar of baselines in tabular DL is raised by identifying two simple and powerful deep architectures, including a ResNet-like architecture which turns out to be a strong baseline that is often missing in prior works.

Generalized ODIN: Detecting Out-of-Distribution Image Without Learning From Out-of-Distribution Data

This work bases its work on a popular method ODIN, proposing two strategies for freeing it from the needs of tuning with OoD data, while improving its OoD detection performance, and proposing to decompose confidence scoring as well as a modified input pre-processing method.

Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles

This work proposes an alternative to Bayesian NNs that is simple to implement, readily parallelizable, requires very little hyperparameter tuning, and yields high quality predictive uncertainty estimates.

Uncertainty in Gradient Boosting via Ensembles

Experiments on a range of regression and classification datasets show that ensembles of gradient boosting models yield improved predictive performance, and measures of uncertainty successfully enable detection of out-of-domain inputs.