Rethinking Importance Weighting for Deep Learning under Distribution Shift
@article{Fang2020RethinkingIW, title={Rethinking Importance Weighting for Deep Learning under Distribution Shift}, author={Tongtong Fang and Nan Lu and Gang Niu and Masashi Sugiyama}, journal={ArXiv}, year={2020}, volume={abs/2006.04662} }
Under distribution shift (DS) where the training data distribution differs from the test one, a powerful technique is importance weighting (IW) which handles DS in two separate steps: weight estimation (WE) estimates the test-over-training density ratio and weighted classification (WC) trains the classifier from weighted training data. However, IW cannot work well on complex data, since WE is incompatible with deep learning. In this paper, we rethink IW and theoretically show it suffers from a…
Figures and Tables from this paper
31 Citations
LTF: A Label Transformation Framework for Correcting Target Shift
- Computer Science
- 2020
An end-to-end Label Transformation Framework (LTF) for correcting target shift is proposed, which implicitly models the shift of PY and the conditional distribution PX|Y using neural networks and can handle continuous, discrete, and even multidimensional labels in a unified way and is scalable to large data.
Learning with Instance-Dependent Label Noise: A Sample Sieve Approach
- Computer ScienceICLR
- 2021
This paper proposes CORES^2 (COnfidence REgularized Sample Sieve), which progressively sieves out corrupted samples and provides a generic machinery for anatomizing noisy datasets and a flexible interface for various robust training techniques to further improve the performance.
Rethinking Importance Weighting for Transfer Learning
- Computer ScienceArXiv
- 2021
This article introduces a method of causal mechanism transfer that incorporates causal structure in TL and reviews recent advances based on joint and dynamic importancepredictor estimation.
Mandoline: Model Evaluation under Distribution Shift
- Computer ScienceICML
- 2021
Empirical validation on NLP and vision tasks shows that Mandoline can estimate performance on the target distribution up to 3 × more accurately compared to standard baselines, and a density ratio estimation framework for the slices is described.
Adversarial Attacks and Defense for Non-Parametric Two-Sample Tests
- Computer ScienceArXiv
- 2022
This paper systematically uncovers the failure mode of non-parametric TSTs through adversarial attacks and then proposes corresponding defense strategies and a max-min optimization that iteratively generates adversarial pairs to train the deep kernels.
Learning to Bootstrap for Combating Label Noise
- Computer ScienceArXiv
- 2022
This paper proposes a more generic learnable loss objective which enables a joint reweighting of instances and labels at once, and dynamically adjusts the per-sample importance weight between the real observed labels and pseudo-labels, where the weights are efficiently determined in a meta process.
NICO++: Towards Better Benchmarking for Domain Generalization
- Computer ScienceArXiv
- 2022
A large-scale benchmark with extensive labeled domains named NICO ++ is proposed along with more rational evaluation methods for comprehensively evaluating DG algorithms to prove that limited concept shift and significant covariate shift favor the evaluation capability for generalization.
New-Onset Diabetes Assessment Using Artificial Intelligence-Enhanced Electrocardiography
- Medicine, Computer ScienceArXiv
- 2022
ECG-based assessment outperforms the ADA Risk test, achieving a higher area under the curve and positive predictive value and 2.6 times the prevalence of diabetes in the cohort, suggesting that the task is beyond current clinical capabilities.
Self-Adaptive Forecasting for Improved Deep Learning on Non-Stationary Time-Series
- Computer ScienceArXiv
- 2022
A novel method, Self-Adaptive Forecasting (SAF), to modify the training of time-series forecasting models to improve their performance on forecasting tasks with such non-stationary time- series data, leading to superior generalization.
Unified Perspective on Probability Divergence via Maximum Likelihood Density Ratio Estimation: Bridging KL-Divergence and Integral Probability Metrics
- Computer ScienceArXiv
- 2022
It is shown that the KL-divergence and the IPMs can be represented as maximal likelihoods differing only by sampling schemes, and this result is used to derive a unified form ofThe IPMs and a relaxed estimation method.
References
SHOWING 1-10 OF 80 REFERENCES
Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss
- Computer ScienceNeurIPS
- 2019
A theoretically-principled label-distribution-aware margin (LDAM) loss motivated by minimizing a margin-based generalization bound is proposed that replaces the standard cross-entropy objective during training and can be applied with prior strategies for training with class-imbalance such as re-weighting or re-sampling.
Learning to Reweight Examples for Robust Deep Learning
- Computer ScienceICML
- 2018
This work proposes a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions that can be easily implemented on any type of deep network, does not require any additional hyperparameter tuning, and achieves impressive performance on class imbalance and corrupted label problems where only a small amount of clean validation data is available.
How does Disagreement Help Generalization against Label Corruption?
- Computer ScienceICML
- 2019
A robust learning paradigm called Co-teaching+, which bridges the "Update by Disagreement" strategy with the original Co-Teaching, which is much superior to many state-of-the-art methods in the robustness of trained models.
Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
It is proved that, when ReLU is the only non-linearity, the loss curvature is immune to class-dependent label noise, and it is shown how one can estimate these probabilities, adapting a recent technique for noise estimation to the multi-class setting, and providing an end-to-end framework.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- Computer ScienceICML
- 2015
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Learning Deep Representation for Imbalanced Classification
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
The representation learned by this approach, when combined with a simple k-nearest neighbor (kNN) algorithm, shows significant improvements over existing methods on both high- and low-level vision classification tasks that exhibit imbalanced class distribution.
MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels
- Computer ScienceICML
- 2018
Experimental results demonstrate that the proposed novel technique of learning another neural network, called MentorNet, to supervise the training of the base deep networks, namely, StudentNet, can significantly improve the generalization performance of deep networks trained on corrupted training data.
Domain-Adversarial Training of Neural Networks
- Computer ScienceJ. Mach. Learn. Res.
- 2016
A new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions, which can be achieved in almost any feed-forward model by augmenting it with few standard layers and a new gradient reversal layer.
Domain Adaptation under Target and Conditional Shift
- Computer ScienceICML
- 2013
This work considers domain adaptation under three possible scenarios, kernel embedding of conditional as well as marginal distributions, and proposes to estimate the weights or transformations by reweighting or transforming training data to reproduce the covariate distribution on the test domain.
Robust Learning under Uncertain Test Distributions: Relating Covariate Shift to Model Misspecification
- Computer ScienceICML
- 2014
This paper describes the problem of learning under changing distributions as a game between a learner and an adversary, and provides an algorithm, robust covariate shift adjustment (RCSA), that provides relevant weights.