• Corpus ID: 219530626

Rethinking Importance Weighting for Deep Learning under Distribution Shift

@article{Fang2020RethinkingIW,
  title={Rethinking Importance Weighting for Deep Learning under Distribution Shift},
  author={Tongtong Fang and Nan Lu and Gang Niu and Masashi Sugiyama},
  journal={ArXiv},
  year={2020},
  volume={abs/2006.04662}
}
Under distribution shift (DS) where the training data distribution differs from the test one, a powerful technique is importance weighting (IW) which handles DS in two separate steps: weight estimation (WE) estimates the test-over-training density ratio and weighted classification (WC) trains the classifier from weighted training data. However, IW cannot work well on complex data, since WE is incompatible with deep learning. In this paper, we rethink IW and theoretically show it suffers from a… 

Figures and Tables from this paper

LTF: A Label Transformation Framework for Correcting Target Shift
TLDR
An end-to-end Label Transformation Framework (LTF) for correcting target shift is proposed, which implicitly models the shift of PY and the conditional distribution PX|Y using neural networks and can handle continuous, discrete, and even multidimensional labels in a unified way and is scalable to large data.
Learning with Instance-Dependent Label Noise: A Sample Sieve Approach
TLDR
This paper proposes CORES^2 (COnfidence REgularized Sample Sieve), which progressively sieves out corrupted samples and provides a generic machinery for anatomizing noisy datasets and a flexible interface for various robust training techniques to further improve the performance.
Rethinking Importance Weighting for Transfer Learning
TLDR
This article introduces a method of causal mechanism transfer that incorporates causal structure in TL and reviews recent advances based on joint and dynamic importancepredictor estimation.
Mandoline: Model Evaluation under Distribution Shift
TLDR
Empirical validation on NLP and vision tasks shows that Mandoline can estimate performance on the target distribution up to 3 × more accurately compared to standard baselines, and a density ratio estimation framework for the slices is described.
Adversarial Attacks and Defense for Non-Parametric Two-Sample Tests
TLDR
This paper systematically uncovers the failure mode of non-parametric TSTs through adversarial attacks and then proposes corresponding defense strategies and a max-min optimization that iteratively generates adversarial pairs to train the deep kernels.
Learning to Bootstrap for Combating Label Noise
TLDR
This paper proposes a more generic learnable loss objective which enables a joint reweighting of instances and labels at once, and dynamically adjusts the per-sample importance weight between the real observed labels and pseudo-labels, where the weights are efficiently determined in a meta process.
NICO++: Towards Better Benchmarking for Domain Generalization
TLDR
A large-scale benchmark with extensive labeled domains named NICO ++ is proposed along with more rational evaluation methods for comprehensively evaluating DG algorithms to prove that limited concept shift and significant covariate shift favor the evaluation capability for generalization.
New-Onset Diabetes Assessment Using Artificial Intelligence-Enhanced Electrocardiography
TLDR
ECG-based assessment outperforms the ADA Risk test, achieving a higher area under the curve and positive predictive value and 2.6 times the prevalence of diabetes in the cohort, suggesting that the task is beyond current clinical capabilities.
Self-Adaptive Forecasting for Improved Deep Learning on Non-Stationary Time-Series
TLDR
A novel method, Self-Adaptive Forecasting (SAF), to modify the training of time-series forecasting models to improve their performance on forecasting tasks with such non-stationary time- series data, leading to superior generalization.
Unified Perspective on Probability Divergence via Maximum Likelihood Density Ratio Estimation: Bridging KL-Divergence and Integral Probability Metrics
TLDR
It is shown that the KL-divergence and the IPMs can be represented as maximal likelihoods differing only by sampling schemes, and this result is used to derive a unified form ofThe IPMs and a relaxed estimation method.
...
1
2
3
4
...

References

SHOWING 1-10 OF 80 REFERENCES
Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss
TLDR
A theoretically-principled label-distribution-aware margin (LDAM) loss motivated by minimizing a margin-based generalization bound is proposed that replaces the standard cross-entropy objective during training and can be applied with prior strategies for training with class-imbalance such as re-weighting or re-sampling.
Learning to Reweight Examples for Robust Deep Learning
TLDR
This work proposes a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions that can be easily implemented on any type of deep network, does not require any additional hyperparameter tuning, and achieves impressive performance on class imbalance and corrupted label problems where only a small amount of clean validation data is available.
How does Disagreement Help Generalization against Label Corruption?
TLDR
A robust learning paradigm called Co-teaching+, which bridges the "Update by Disagreement" strategy with the original Co-Teaching, which is much superior to many state-of-the-art methods in the robustness of trained models.
Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach
TLDR
It is proved that, when ReLU is the only non-linearity, the loss curvature is immune to class-dependent label noise, and it is shown how one can estimate these probabilities, adapting a recent technique for noise estimation to the multi-class setting, and providing an end-to-end framework.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
TLDR
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Learning Deep Representation for Imbalanced Classification
TLDR
The representation learned by this approach, when combined with a simple k-nearest neighbor (kNN) algorithm, shows significant improvements over existing methods on both high- and low-level vision classification tasks that exhibit imbalanced class distribution.
MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels
TLDR
Experimental results demonstrate that the proposed novel technique of learning another neural network, called MentorNet, to supervise the training of the base deep networks, namely, StudentNet, can significantly improve the generalization performance of deep networks trained on corrupted training data.
Domain-Adversarial Training of Neural Networks
TLDR
A new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions, which can be achieved in almost any feed-forward model by augmenting it with few standard layers and a new gradient reversal layer.
Domain Adaptation under Target and Conditional Shift
TLDR
This work considers domain adaptation under three possible scenarios, kernel embedding of conditional as well as marginal distributions, and proposes to estimate the weights or transformations by reweighting or transforming training data to reproduce the covariate distribution on the test domain.
Robust Learning under Uncertain Test Distributions: Relating Covariate Shift to Model Misspecification
TLDR
This paper describes the problem of learning under changing distributions as a game between a learner and an adversary, and provides an algorithm, robust covariate shift adjustment (RCSA), that provides relevant weights.
...
1
2
3
4
5
...