# Self-training Avoids Using Spurious Features Under Domain Shift

@article{Chen2020SelftrainingAU, title={Self-training Avoids Using Spurious Features Under Domain Shift}, author={Yining Chen and Colin Wei and Ananya Kumar and Tengyu Ma}, journal={ArXiv}, year={2020}, volume={abs/2006.10032} }

In unsupervised domain adaptation, existing theory focuses on situations where the source and target domains are close. In practice, conditional entropy minimization and pseudo-labeling work even when the domain shifts are much larger than those analyzed by existing theory. We identify and analyze one particular setting where the domain shift can be large, but these algorithms provably work: certain spurious features correlate with the label in the source domain but are independent of the label…

## 48 Citations

### Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

- Computer ScienceICLR
- 2021

This work provides a unified theoretical analysis of self-training with deep networks for semi-supervised learning, unsupervised domain adaptation, and unsuper supervised learning and proves that under these assumptions, the minimizers of population objectives based on self- training and input-consistency regularization will achieve high accuracy with respect to ground-truth labels.

### Distributionally Robust Learning for Unsupervised Domain Adaptation

- Computer ScienceArXiv
- 2020

A distributionally robust learning method for unsupervised domain adaptation (UDA) that scales to modern computer vision benchmarks, and it is demonstrated that DRST captures shape features more effectively, and reduces the extent of distributional shift during self-training.

### A Theory of Label Propagation for Subpopulation Shift

- Computer ScienceICML
- 2021

This work proposes a provably effective framework for domain adaptation based on label propagation based on a simple but realistic expansion assumption, and adapt consistency-based semi-supervised learning methods to domain adaptation settings and gain significant improvements.

### Learn what you can't learn: Regularized Ensembles for Transductive Out-of-distribution Detection

- Computer ScienceArXiv
- 2020

This paper proposes a novel method that uses an artificial labeling scheme for the test data and regularization to obtain ensembles of models that produce contradictory predictions only on the OOD samples in a test batch, and is able to significantly outperform both inductive and transductive baselines on difficult OOD detection scenarios.

### In-N-Out: Pre-Training and Self-Training using Auxiliary Information for Out-of-Distribution Robustness

- Computer ScienceICLR
- 2021

In-N-Out is introduced, which first trains a model with auxiliary inputs and uses it to pseudolabel all the in-distribution inputs, then pre-trains a model on OOD auxiliary outputs and fine-tunes this model with the pseudolabels (self-training).

### Adversarial Unlearning: Reducing Confidence Along Adversarial Directions

- Computer ScienceArXiv
- 2022

A complementary regularization strategy that reduces confidence on out-of-distribution examples lying along directions adversarially chosen to increase training loss, which can be easily integrated into training pipelines with a few lines of code.

### Towards Understanding GD with Hard and Conjugate Pseudo-labels for Test-Time Adaptation

- Computer ScienceArXiv
- 2022

This work considers a setting that a model needs to adapt to a new domain under distribution shifts, given that only unlabeled test samples from the new domain are accessible at test time, and aims at theoretically understanding GD with hard and conjugate labels for a binary classiﬁcation problem.

### Self-training Converts Weak Learners to Strong Learners in Mixture Models

- Computer ScienceAISTATS
- 2022

The results imply that mixture models can be learned to within ε of the Bayes-optimal accuracy using at most O ( d ) labeled examples and ˜O ( d/ε 2 ) unlabeled examples by way of a semi-supervised self-training algorithm.

### Robust Representation Learning via Perceptual Similarity Metrics

- Computer ScienceICML
- 2021

This work proposes Contrastive Input Morphing (CIM), a representation learning framework that learns input-space transformations of the data to mitigate the effect of irrelevant input features on downstream performance and is complementary to other mutual information-based representation learning techniques.

### Distributionally Robust Learning for Uncertainty Calibration under Domain Shift

- Computer Science
- 2020

The proposed framework for learning calibrated uncertainties under domain shifts is based on the distributionally robust learning (DRL) framework, and it is demonstrated that the introduction of DRL leads to significant improvements in cross-domain performance.

## References

SHOWING 1-10 OF 48 REFERENCES

### Confidence Regularized Self-Training

- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019

A confidence regularized self-training (CRST) framework, formulated as regularizedSelf-training, that treats pseudo-labels as continuous latent variables jointly optimized via alternating optimization and proposes two types of confidence regularization: label regularization (LR) and modelRegularization (MR).

### A theory of learning from different domains

- Computer ScienceMachine Learning
- 2009

A classifier-induced divergence measure that can be estimated from finite, unlabeled samples from the domains and shows how to choose the optimal combination of source and target error as a function of the divergence, the sample sizes of both domains, and the complexity of the hypothesis class.

### A DIRT-T Approach to Unsupervised Domain Adaptation

- Computer ScienceICLR
- 2018

Two novel and related models are proposed: the Virtual Adversarial Domain Adaptation (VADA) model, which combines domain adversarial training with a penalty term that punishes the violation the cluster assumption, and the Decision-boundary Iterative Refinement Training with a Teacher (DIRT-T) models, which takes the VADA model as initialization and employs natural gradient steps to further minimize the Cluster assumption violation.

### Understanding Self-Training for Gradual Domain Adaptation

- Computer ScienceICML
- 2020

It is proved the first non-vacuous upper bound on the error of self-training with gradual shifts, under settings where directly adapting to the target domain can result in unbounded error.

### Learning Robust Representations by Projecting Superficial Statistics Out

- Computer ScienceICLR
- 2019

This work aims to produce a classifier that will generalize to previously unseen domains, even when domain identifiers are not available during training, and incorporates the gray-level co-occurrence matrix (GLCM) to extract patterns that prior knowledge suggests are superficial.

### Unlabeled Data Improves Adversarial Robustness

- Computer ScienceNeurIPS
- 2019

It is proved that unlabeled data bridges the complexity gap between standard and robust classification: a simple semisupervised learning procedure (self-training) achieves high robust accuracy using the same number of labels required for achieving high standard accuracy.

### Unsupervised Domain Adaptation by Backpropagation

- Computer ScienceICML
- 2015

The method performs very well in a series of image classification experiments, achieving adaptation effect in the presence of big domain shifts and outperforming previous state-of-the-art on Office datasets.

### Are Labels Required for Improving Adversarial Robustness?

- Computer ScienceNeurIPS
- 2019

Theoretically, it is shown that in a simple statistical setting, the sample complexity for learning an adversarially robust model from unlabeled data matches the fully supervised case up to constant factors, and this finding extends as well to the more realistic case where unlabeling data is also uncurated, therefore opening a new avenue for improving adversarial training.

### Conditional variance penalties and domain shift robustness

- Computer ScienceMachine Learning
- 2020

This work assumes that the domain itself is not observed and hence a latent variable and can sometimes observe a typically discrete identifier or “ID”, which refers to the identity of the person.

### Does Unlabeled Data Provably Help? Worst-case Analysis of the Sample Complexity of Semi-Supervised Learning

- Computer ScienceCOLT
- 2008

It is proved that for basic hypothesis classes over the real line, if the distribution of unlabeled data is ‘smooth’, knowledge of that distribution cannot improve the labeled sample complexity by more than a constant factor.