• Corpus ID: 249953660

Regression with Label Permutation in Generalized Linear Model

@inproceedings{Fang2022RegressionWL,
  title={Regression with Label Permutation in Generalized Linear Model},
  author={Guanhua Fang and Ping Li},
  year={2022}
}
1 The assumption that response and predictor belong to the same statistical unit may be violated in practice. Unbiased estimation and recovery of true label ordering based on unlabeled data are challenging tasks and have attracted increasing attentions in the recent literature. In this paper, we present a relatively complete analysis of label permutation problem for the generalized linear model with multivariate responses. The theory is established under different scenarios, with knowledge of… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 45 REFERENCES

Optimal Estimator for Unlabeled Linear Regression

TLDR
This paper proposes a one-step estimator which is optimal from both the computational and the statistical aspects of unlabeled linear regression and exhibits the same order of computational complexity as that of the oracle case.

Linear regression with sparsely permuted data

TLDR
This paper considers the common scenario of "sparsely permuted data" in which only a small fraction of the data is affected by a mismatch between response and predictors and proposes an approach to treat permutedData as outliers which motivates the use of robust regression formulations to estimate the regression parameter.

A Sparse Representation-Based Approach to Linear Regression with Partially Shuffled Labels

TLDR
It turns out that in this situation, estimation of the regression parameter on the one hand and recovery of the underlying permutation on the other hand can be decoupled so that the computational hardness associated with the latter can be sidestepped.

A Two-Stage Approach to Multivariate Linear Regression with Sparsely Mismatched Data

TLDR
It is shown that the conditions for permutation recovery become considerably less stringent as the number of responses £m per observation increase, and the required signal-to-noise ratio no longer depends on the sample size $n$.

Linear Regression with Shuffled Labels

TLDR
This work proposes several estimators that recover the weights of a noisy linear model from labels that are shuffled by an unknown permutation, and shows that the analog of the classical least-squares estimator produces inconsistent estimates in this setting.

Stochastic EM for Shuffled Linear Regression

TLDR
This work proposes a framework that treats the unknown permutation as a latent variable and maximize the likelihood of observations using a stochastic expectation-maximization (EM) approach, and shows on synthetic data that the Stochastic EM algorithm developed has several advantages, including lower parameter error, less sensitivity to the choice of initialization, and significantly better performance on datasets that are only partially shuffled.

A Pseudo-Likelihood Approach to Linear Regression With Partially Shuffled Data

TLDR
A method to adjust for such mismatches under “partial shuffling” in which a sufficiently large fraction of (predictors, response)-pairs are observed in their correct correspondence is presented, based on a pseudo-likelihood in which each term takes the form of a two-component mixture density.

Estimation in exponential family Regression based on linked data contaminated by mismatch error.

TLDR
A method based on observation-specific offsets to account for potential mismatches and $\ell_1$-penalization is proposed, and its statistical properties are discussed.

Optimal Permutation Recovery in Permuted Monotone Matrix Model

TLDR
An estimator based on the best linear projection is proposed, which is shown to be minimax rate-optimal for both exact recovery and partial recovery, as quantified by the normalized Kendall’s tau distance.

Permutation Recovery from Multiple Measurement Vectors in Unlabeled Sensing

TLDR
This paper provides an affirmative answer for an oracle setting in which the matrix of signals is known by establishing matching upper and lower bounds on the required Signal-to-Noise Ratio (SNR), which – as distinguished from the case of a single measurement vector – involves a dependence on the stable rank of the Matrix of signals.