Corpus ID: 225066873

Robust Correction of Sampling Bias Using Cumulative Distribution Functions

  title={Robust Correction of Sampling Bias Using Cumulative Distribution Functions},
  author={Bijan Mazaheri and Siddhartha Jain and Jehoshua Bruck},
Varying domains and biased datasets can lead to differences between the training and the target distributions, known as covariate shift. Current approaches for alleviating this often rely on estimating the ratio of training and target probability density functions. These techniques require parameter tuning and can be unstable across different datasets. We present a new method for handling covariate shift using the empirical cumulative distribution function estimates of the target distribution… Expand

Figures and Tables from this paper


Robust Classification Under Sample Selection Bias
This work develops a framework for learning a robust bias-aware (RBA) probabilistic classifier that adapts to different sample selection biases using a minimax estimation formulation and demonstrates the behavior and effectiveness of the approach on binary classification tasks. Expand
Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation
This paper proposes a direct importance estimation method that does not involve density estimation and is equipped with a natural cross validation procedure and hence tuning parameters such as the kernel width can be objectively optimized. Expand
Correcting Sample Selection Bias by Unlabeled Data
A nonparametric method which directly produces resampling weights without distribution estimation is presented, which works by matching distributions between training and testing sets in feature space. Expand
Relative Density-Ratio Estimation for Robust Distribution Comparison
This letter uses relative divergences for distribution comparison, which involves approximation of relative density ratios, and shows that the proposed divergence estimator has asymptotic variance independent of the model complexity under a parametric setup, implying that the suggested estimator hardly overfits even with complex models. Expand
A Least-squares Approach to Direct Importance Estimation
This paper proposes a new importance estimation method that has a closed-form solution; the leave-one-out cross-validation score can also be computed analytically and is computationally highly efficient and simple to implement. Expand
Covariate Shift Adaptation by Importance Weighted Cross Validation
This paper proposes a new method called importance weighted cross validation (IWCV), for which its unbiasedness even under the covariate shift is proved, and the IWCV procedure is the only one that can be applied for unbiased classification under covariates. Expand
Improving predictive inference under covariate shift by weighting the log-likelihood function
Abstract A class of predictive densities is derived by weighting the observed samples in maximizing the log-likelihood function. This approach is effective in cases such as sample surveys or designExpand
Greedy function approximation: A gradient boosting machine.
Function estimation/approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansionsExpand
Variable kernel density estimation
Summary This paper considers the problem of selecting optimal bandwidths for variable (sample-point adaptive) kernel density estimation. A data-driven variable bandwidth selector is proposed,Expand
Soft Margins for AdaBoost
It is found that ADABOOST asymptotically achieves a hard margin distribution, i.e. the algorithm concentrates its resources on a few hard-to-learn patterns that are interestingly very similar to Support Vectors. Expand