• Corpus ID: 4570705

Stochastic EM for Shuffled Linear Regression

@article{Abid2018StochasticEF,
  title={Stochastic EM for Shuffled Linear Regression},
  author={Abubakar Abid and James Y. Zou},
  journal={ArXiv},
  year={2018},
  volume={abs/1804.00681}
}
We consider the problem of inference in a linear regression model in which the relative ordering of the input features and output labels is not known. Such datasets naturally arise from experiments in which the samples are shuffled or permuted during the protocol. In this work, we propose a framework that treats the unknown permutation as a latent variable. We maximize the likelihood of observations using a stochastic expectation-maximization (EM) approach. We compare this to the dominant… 
Shuffled Linear Regression with Erroneous Observations
TLDR
An optimal recursive algorithm is proposed that updates the estimate from the underdetermined function that is based on a first-order permutation-invariant constraint and aims for per-iteration minimization of the mean square estimate error.
A Pseudo-Likelihood Approach to Linear Regression With Partially Shuffled Data
TLDR
A method to adjust for such mismatches under “partial shuffling” in which a sufficiently large fraction of (predictors, response)-pairs are observed in their correct correspondence is presented, based on a pseudo-likelihood in which each term takes the form of a two-component mixture density.
Regularization for Shuffled Data Problems via Exponential Family Priors on the Permutation Group
TLDR
A flexible exponential family prior on the permutation group for this purpose that can be used to integrate various structures such as sparse and locally constrained shuffling is proposed and compares favorably to competing methods.
Linear regression with partially mismatched data: local search with theoretical guarantees
Linear regression is a fundamental modeling tool in statistics and related fields. In this paper, we study an important variant of linear regression in which the predictor-response pairs are
An Algebraic-Geometric Approach to Shuffled Linear Regression
TLDR
Using the machinery of algebraic geometry it is proved that as long as the independent samples are generic, this polynomial system is always consistent with at most $n!$ complex roots, regardless of any type of corruption inflicted on the observations.
A Two-Stage Approach to Multivariate Linear Regression with Sparsely Mismatched Data
TLDR
It is shown that the conditions for permutation recovery become considerably less stringent as the number of responses £m per observation increase, and the required signal-to-noise ratio no longer depends on the sample size $n$.
An Algebraic-Geometric Approach for Linear Regression Without Correspondences
TLDR
The machinery of algebraic geometry is used, which uses symmetric polynomials to extract permutation-invariant constraints that the parameters of the linear regression model must satisfy, to prove that as long as the independent samples are generic, this polynomial system is always consistent with at most n complex roots, regardless of any type of corruption inflicted on the observations.
Homomorphic Sensing
TLDR
An algebraic theory is developed which establishes conditions guaranteeing that points in the subspace are uniquely determined from their homomorphic image under some transformation in the set.
Algebraically-initialized Expectation Maximization for Header-free Communication
TLDR
This paper tackles the problem of shuffled linear regression for large-scale wireless sensor networks with header-free communication by using results from algebraic geometry as well as an alternating optimization scheme to propose the Algebraically-Initialized Expectation Maximization algorithm.
Eigenspace conditions for homomorphic sensing
TLDR
It is shown that these eigenspace conditions are true when the endomorphisms are permutations composed with coordinate projections, leading to an abstract proof of the recent unlabeled sensing theorem of Unnikrishnan et al.
...
1
2
...

References

SHOWING 1-10 OF 23 REFERENCES
Linear Regression with Shuffled Labels
Is it possible to perform linear regression on datasets whose labels are shuffled with respect to the inputs? We explore this question by proposing several estimators that recover the weights of a
Denoising linear models with permuted data
TLDR
This work focuses on the denoising problem and characterize the minimax error rate up to logarithmic factors, and provides an exact algorithm for the noiseless problem and demonstrates its performance on an image point-cloud matching task.
Linear regression with an unknown permutation: Statistical and computational limits
TLDR
This work analyzes the problem of permutation recovery in a random design setting in which the entries of the matrix A are drawn i.i.d. from a standard Gaussian distribution, and establishes sharp conditions on the SNR, sample size n, and dimension d under which Π* is exactly and approximately recoverable.
Learning from Label Proportions by Optimizing Cluster Model Selection
TLDR
The problem of learning from label proportions is defined and a solution based on clustering is presented, which empirically shows a better prediction performance than recent approaches based on probabilistic SVMs, Kernel k-Means or conditional exponential models.
Linear regression without correspondence
This article considers algorithmic and statistical aspects of linear regression when the correspondence between the covariates and the responses is unknown. First, a fully polynomial-time
Convex and scalable weakly labeled SVMs
TLDR
This paper focuses on SVMs and proposes the WELLSVM via a novel label generation strategy, which leads to a convex relaxation of the original MIP, which is at least as tight as existing convex Semi-Definite Programming (SDP) relaxations.
\(\propto\)SVM for Learning with Label Proportions
TLDR
A new method is proposed, or $\propto$SVM, which explicitly models the latent unknown instance labels together with the known group label proportions in a large-margin framework, and outperforms the state-of-the-art, especially for larger group sizes.
beta-risk: a New Surrogate Risk for Learning from Weakly Labeled Data
TLDR
This paper introduces the beta-risk as a generalized formulation of the standard empirical risk based on surrogate margin-based loss functions and proposes a soft margin beta-svm algorithm which behaves better that the state of the art.
Maximum Likelihood Signal Amplitude Estimation Based on Permuted Blocks of Differently Binary Quantized Observations of a Signal in Noise
Parameter estimation based on binary quantized observations is considered given the estimation system does not know which of a set of quantizers was used, without replacement, for each block of
Weakly Supervised Learning of Part-Based Spatial Models for Visual Object Recognition
TLDR
This paper investigates a new method of learning part-based models for visual object recognition, from training data that only provides information about class membership (and not object location or configuration), and shows that this weakly supervised technique produces better results.
...
1
2
3
...