• Corpus ID: 245650182

Optimal Representations for Covariate Shift

  title={Optimal Representations for Covariate Shift},
  author={Yangjun Ruan and Yann Dubois and Chris J. Maddison},
Machine learning systems often experience a distribution shift between training and testing. In this paper, we introduce a simple variational objective whose optima are exactly the set of all representations on which risk minimizers are guaranteed to be robust to any distribution shift that preserves the Bayes predictor, e.g., covariate shifts. Our objective has two components. First, a representation must remain discriminative for the task, i.e., some predictor must be able to simultaneously… 

Figures and Tables from this paper

Understanding new tasks through the lens of training data via exponential tilting

This work forms a distribution shift model based on the exponential tilt assumption and learns train data importance weights minimizing the KL divergence between labeled train and unlabeled target datasets, which can be used for downstream tasks such as target performance evaluation, fine-tuning, and model selection.

Improving Self-Supervised Learning by Characterizing Idealized Representations

This work characterize properties that SSL representations should ideally satisfy and proves necessary and sufficient conditions such that for any task invariant to given data augmentations, desired probes trained on that representation attain perfect accuracy.

Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations

It is demonstrated that simple last layer retraining on large ImageNet-trained models can match or outperform state-of-the-art approaches on spurious correlation benchmarks, but with profoundly lower complexity and computational expenses.

Diverse Weight Averaging for Out-of-Distribution Generalization

Diverse Weight Averaging is proposed that makes a simple change to this strategy: DiWA averages the weights obtained from several independent training runs rather than from a single run, and highlights the need for diversity by a new bias-variance-covariance-locality decomposition of the expected error.

Is a Caption Worth a Thousand Images? A Controlled Study for Representation Learning

This work studies this question through a carefully controlled comparison of two approaches in terms of their ability to learn representations that generalize to downstream classification tasks, finding that when the pre-training dataset meets certain criteria—it is suf ficiently large and contains descriptive captions with low variability—image-only methods do not match CLIP’s transfer performance, even when they are trained with more image data.

NICO++: Towards Better Benchmarking for Domain Generalization

A large-scale benchmark with extensive labeled domains named NICO ++ is proposed along with more rational evaluation methods for comprehensively evaluating DG algorithms to prove that limited concept shift and significant covariate shift favor the evaluation capability for generalization.

Grounding Visual Representations with Texts for Domain Generalization

This work advocates for leveraging natural language supervision for the domain generalization task and introduces two modules to ground visual representations with texts containing typical reasoning of humans: Visual andtextual Joint Embedder and Textual Explanation Generator.

OOD-Probe: A Neural Interpretation of Out-of-Domain Generalization

Aible framework that eval-uates OOD systems with granularity using a probing module that predicts the originating domain from intermediate representations, finding that representations always encode some information about the domain.

Towards Open Set 3D Learning: A Benchmark on Object Point Clouds

This paper provides the first broad study on Open Set 3D learning with a novel testbed with settings of increasing complexity in terms of category semantic shift, and investigates the related out-of-distribution and Open Set 2D literature to understand if and how their most recent approaches are effective on 3D data.

WOODS: Benchmarks for Out-of-Distribution Generalization in Time Series Tasks

WOODS: eight challenging open-source time series benchmarks covering a diverse range of data modalities, such as videos, brain recordings, and sensor signals is presented, underscoring the new challenges posed by time series tasks.



Learning Optimal Representations with the Decodable Information Bottleneck

This work proposes the Decodable Information Bottleneck (DIB), a framework that considers information retention and compression from the perspective of the desired predictive family and gives rise to representations that are optimal in terms of expected test performance and can be estimated with guarantees.

Analysis of Representations for Domain Adaptation

The theory illustrates the tradeoffs inherent in designing a representation for domain adaptation and gives a new justification for a recently proposed model which explicitly minimizes the difference between the source and target domains, while at the same time maximizing the margin of the training set.

Representation Learning with Contrastive Predictive Coding

This work proposes a universal unsupervised learning approach to extract useful representations from high-dimensional data, which it calls Contrastive Predictive Coding, and demonstrates that the approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments.

Out-of-Distribution Generalization via Risk Extrapolation (REx)

This work introduces the principle of Risk Extrapolation (REx), and shows conceptually how this principle enables extrapolation, and demonstrates the effectiveness and scalability of instantiations of REx on various OoD generalization tasks.

Supervised Contrastive Learning

A novel training methodology that consistently outperforms cross entropy on supervised learning tasks across different architectures and data augmentations is proposed, and the batch contrastive loss is modified, which has recently been shown to be very effective at learning powerful representations in the self-supervised setting.

On Variational Bounds of Mutual Information

This work introduces a continuum of lower bounds that encompasses previous bounds and flexibly trades off bias and variance and demonstrates the effectiveness of these new bounds for estimation and representation learning.

Domain Adaptation: Learning Bounds and Algorithms

A novel distance between distributions, discrepancy distance, is introduced that is tailored to adaptation problems with arbitrary loss functions, and Rademacher complexity bounds are given for estimating the discrepancy distance from finite samples for different loss functions.

On Learning Invariant Representations for Domain Adaptation

This paper constructs a simple counterexample showing that, contrary to common belief, the above conditions are not sufficient to guarantee successful domain adaptation, and proposes a natural and interpretable generalization upper bound that explicitly takes into account the aforementioned shift.

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

The results suggest that regularization is important for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization, and introduce a stochastic optimization algorithm, with convergence guarantees, to efficiently train group DRO models.

Impossibility Theorems for Domain Adaptation

The domain adaptation problem in machine learning occurs when the test data generating distribution differs from the one that generates the training data. It is clear that the success of learning