• Corpus ID: 245005710

Extending the WILDS Benchmark for Unsupervised Adaptation

@article{Sagawa2021ExtendingTW,
  title={Extending the WILDS Benchmark for Unsupervised Adaptation},
  author={Shiori Sagawa and Pang Wei Koh and Tony Lee and Irena Gao and Sang Michael Xie and Kendrick Shen and Ananya Kumar and Weihua Hu and Michihiro Yasunaga and Henrik Marklund and Sara Beery and Etienne David and Ian Stavness and Wei Guo and Jure Leskovec and Kate Saenko and Tatsunori B. Hashimoto and Sergey Levine and Chelsea Finn and Percy Liang},
  journal={ArXiv},
  year={2021},
  volume={abs/2112.05090}
}
Machine learning systems deployed in the wild are often trained on a source distribution but deployed on a different target distribution. Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data and can often be obtained from distributions beyond the source distribution as well. However, existing distribution shift benchmarks with unlabeled data do not reflect the breadth of scenarios that arise in real… 
Algorithms and Theory for Supervised Gradual Domain Adaptation
TLDR
The problem of supervised gradual domain adaptation, where labeled data from shifting distributions are available to the learner along the trajectory, is studied, and a min-max learning objective to learn the representation and classifier simultaneously is proposed.
Can domain adaptation make object recognition work for everyone?
TLDR
The inefficacy of standard DA methods at Geographical DA is demonstrated, highlighting the need for specialized geographical adaptation solutions to address the challenge of making object recognition work for everyone.
WOODS: Benchmarks for Out-of-Distribution Generalization in Time Series Tasks
TLDR
WOODS: eight challenging open-source time series benchmarks covering a diverse range of data modalities, such as videos, brain recordings, and sensor signals is presented, underscoring the new challenges posed by time series tasks.
A Broad Study of Pre-training for Domain Generalization and Adaptation
TLDR
It is observed that simply using a state-of-the-art backbone outperforms existing state- of- the-art domain adaptation baselines and set new baselines on Office-Home and DomainNet improving by 10.7% and 5.5%.
Understanding Gradual Domain Adaptation: Improved Analysis, Optimal Path and Beyond
TLDR
This work analyzes gradual self-training under more general and relaxed assumptions, and proves a significantly improved generalization bound, which implies the existence of an optimal choice of T that minimizes the generalization error, and naturally suggests an optimal way to construct the path of intermediate domains so as to minimize the accumulative path length T ∆ between the source and target.
Beyond Separability: Analyzing the Linear Transferability of Contrastive Representations to Related Subpopulations
TLDR
It is proved that contrastive representations capture relationships between subpopulations in the positive-pair graph: linear transferability can occur when data from the same class in different domains are connected in the graph.
Amortized Prompt: Guide CLIP to Domain Transfer Learning
TLDR
This work proposes AP (Amortized Prompt) as a novel prompt strategy for domain inference in the form of prompt generation and shows that combining domain prompt inference with CLIP enables the model to outperform strong DG baselines and other prompt strategies.
Understanding Why Generalized Reweighting Does Not Improve Over ERM
TLDR
This work first posit the class of Generalized Reweighting algorithms, as a broad category of approaches that iteratively update model parameters based on iterative reweighting of the training samples, and shows that when overparameterized models are trained under GRW, the resulting models are close to that obtained by ERM.
DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery - A Focus on Affinity Prediction Problems with Noise Annotations
TLDR
This work presents DrugOOD1, a systematic OOD dataset curator and benchmark for AI-aided drug discovery, which comes with an open-source Python package that fully automates the data curation and OOD benchmarking processes.

References

SHOWING 1-10 OF 178 REFERENCES
Accuracy on the Line: on the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization
TLDR
This paper empirically shows that out-of-distribution performance is strongly correlated with in-dist distribution performance for a wide range of models and distribution shifts, and provides a candidate theory based on a Gaussian data model that shows how changes in the data covariance arising from distribution shift can affect the observed correlations.
Adaptive Methods for Real-World Domain Generalization
TLDR
This work proposes a domain-adaptive approach consisting of two steps: a) the authors first learn a discriminative domain embedding from unsupervised training examples, and b) use thisdomain embedding as supplementary information to build adomain- Adaptive model, that takes both the input as well as its domain into account while making predictions.
Return of Frustratingly Easy Domain Adaptation
TLDR
This work proposes a simple, effective, and efficient method for unsupervised domain adaptation called CORrelation ALignment (CORAL), which minimizes domain shift by aligning the second-order statistics of source and target distributions, without requiring any target labels.
Deep CORAL: Correlation Alignment for Deep Domain Adaptation
TLDR
This paper extends CORAL to learn a nonlinear transformation that aligns correlations of layer activations in deep neural networks (Deep CORAL), and shows state-of-the-art performance on standard benchmark datasets.
Deep Hashing Network for Unsupervised Domain Adaptation
TLDR
This is the first research effort to exploit the feature learning capabilities of deep neural networks to learn representative hash codes to address the domain adaptation problem and proposes a novel deep learning framework that can exploit labeled source data and unlabeled target data to learn informative hash codes, to accurately classify unseen target data.
Domain-Adversarial Training of Neural Networks
TLDR
A new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions, which can be achieved in almost any feed-forward model by augmenting it with few standard layers and a new gradient reversal layer.
Asymmetric Tri-training for Unsupervised Domain Adaptation
TLDR
This work proposes the use of an asymmetric tri-training method for unsupervised domain adaptation, where two networks are used to label unlabeled target samples, and one network is trained by the pseudo-labeled samples to obtain target-discriminative representations.
In Search of Lost Domain Generalization
TLDR
This paper implements DomainBed, a testbed for domain generalization including seven multi-domain datasets, nine baseline algorithms, and three model selection criteria, and finds that, when carefully implemented, empirical risk minimization shows state-of-the-art performance across all datasets.
Dataset Shift in Machine Learning
TLDR
This volume offers an overview of current efforts to deal with dataset and covariate shift, and places dataset shift in relationship to transfer learning, transduction, local learning, active learning, and semi-supervised learning.
Strategies for Pre-training Graph Neural Networks
TLDR
A new strategy and self-supervised methods for pre-training Graph Neural Networks (GNNs) that avoids negative transfer and improves generalization significantly across downstream tasks, leading up to 9.4% absolute improvements in ROC-AUC over non-pre-trained models and achieving state-of-the-art performance for molecular property prediction and protein function prediction.
...
1
2
3
4
5
...