• Corpus ID: 222141799

Detecting approximate replicate components of a high-dimensional random vector with latent structure

  title={Detecting approximate replicate components of a high-dimensional random vector with latent structure},
  author={Xin Bing and Florentina Bunea and Marten H. Wegkamp},
  journal={arXiv: Methodology},
High-dimensional feature vectors are likely to contain sets of measurements that are approximate replicates of one another. In complex applications, or automated data collection, these feature sets are not known a priori, and need to be determined. This work proposes a class of latent factor models on the observed high-dimensional random vector $X \in \mathbb{R}^p$, for defining, identifying and estimating the index set of its approximately replicate components. The model class is parametrized… 

Figures and Tables from this paper

Inference in latent factor regression with clusterable features
Regression models, in which the observed features X ∈ R and the response Y ∈ R depend, jointly, on a lower dimensional, unobserved, latent vector Z ∈ R , with K p, are popular in a large array of


Adaptive estimation in structured factor models with applications to overlapping clustering
This work introduces a novel estimation method, called LOVE, of the entries and structure of a loading matrix A in a sparse latent factor model X = AZ + E, for an observable random vector X in Rp,
Optimal estimation of sparse topic models
Empirical results show that the proposed estimator is a strong competitor of the existing state-of-the-art algorithms for both non-sparse A$ and sparse A, and has superior performance is many scenarios of interest.
Tensor decompositions for learning latent variable models
A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices, and implies a robust and computationally tractable estimation approach for several popular latent variable models.
Concentration of measure and spectra of random matrices: Applications to correlation matrices, elliptical distributions and beyond
We place ourselves in the setting of high-dimensional statistical inference, where the number of variables $p$ in a data set of interest is of the same order of magnitude as the number of
Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions
A general theorem is derived that bounds the Frobenius norm error for an estimate of the pair of high-dimensional matrix decomposition problems obtained by solving a convex optimization problem that combines the nuclear norm with a general decomposable regularizer.
A Spectral Algorithm for Latent Dirichlet Allocation
This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of multi-view models and topic models, including latent Dirichlet allocation (LDA).
A general Principal Orthogonal complEment Thresholding (POET) framework for large-scale covariance matrix estimation based on the approximate factor model is proposed and allows to exploit conditional sparsity covariance structure for the heavy-tailed data.
On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fPCA
This work provides a unified analysis of the properties of the sample covariance matrix $\Sigma_n$ over the class of $p\times p$ population covariance matrices $\Sigma$ of reduced effective rank
Model assisted variable clustering: Minimax-optimal recovery and algorithms
Model-based clustering defines population level clusters relative to a model that embeds notions of similarity. Algorithms tailored to such models yield estimated clusters with a clear statistical
Learning Binary Latent Variable Models: A Tensor Eigenpair Approach
This paper proposes a novel spectral approach to latent variable models with hidden binary units based on the eigenvectors of both the second order moment matrix and third order moment tensor of the observed data, and proves that under mild non-degeneracy conditions, the method consistently estimates the model parameters at the optimal parametric rate.