• Corpus ID: 222141799

Detecting approximate replicate components of a high-dimensional random vector with latent structure

  title={Detecting approximate replicate components of a high-dimensional random vector with latent structure},
  author={Xin Bing and Florentina Bunea and Marten H. Wegkamp},
  journal={arXiv: Methodology},
High-dimensional feature vectors are likely to contain sets of measurements that are approximate replicates of one another. In complex applications, or automated data collection, these feature sets are not known a priori, and need to be determined. This work proposes a class of latent factor models on the observed high-dimensional random vector $X \in \mathbb{R}^p$, for defining, identifying and estimating the index set of its approximately replicate components. The model class is parametrized… 

Figures and Tables from this paper

Inference in latent factor regression with clusterable features
This work develops inferential tools for β in a class of factor regression models in which the observed features are signed mixtures of the latent factors, and provides a statistical platform for inference in regression on latent cluster centers, thereby increasing the scope of the theoretical results.
Blessing of Dependence: Identifiability and Geometry of Discrete Models with Multiple Binary Latent Variables
  • Yuqi Gu
  • Computer Science, Mathematics
  • 2022
This work presents a general algebraic technique to investigate identifiability of complicated discrete models with latent and graphical components, and reveals an interesting and perhaps surprising phenomenon of blessing-of-dependence geometry.
of the Bernoulli Society for Mathematical Statistics and Probability Volume Twenty Eight Number Two May 2022
A list of forthcoming papers can be found online at http://www.bernoullisociety.org/index. php/publications/bernoulli-journal/bernoulli-journal-papers CONTENTS 713 BELLEC, P.C. and ZHANG, C.-H.


Adaptive estimation in structured factor models with applications to overlapping clustering
This work introduces a novel estimation method, called LOVE, of the entries and structure of a loading matrix A in a sparse latent factor model X = AZ + E, for an observable random vector X in Rp,
Optimal estimation of sparse topic models
Empirical results show that the proposed estimator is a strong competitor of the existing state-of-the-art algorithms for both non-sparse A$ and sparse A, and has superior performance is many scenarios of interest.
Tensor decompositions for learning latent variable models
A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices, and implies a robust and computationally tractable estimation approach for several popular latent variable models.
Concentration of measure and spectra of random matrices: Applications to correlation matrices, elliptical distributions and beyond
We place ourselves in the setting of high-dimensional statistical inference, where the number of variables $p$ in a data set of interest is of the same order of magnitude as the number of
Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions
A general theorem is derived that bounds the Frobenius norm error for an estimate of the pair of high-dimensional matrix decomposition problems obtained by solving a convex optimization problem that combines the nuclear norm with a general decomposable regularizer.
A Spectral Algorithm for Latent Dirichlet Allocation
This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of multi-view models and topic models, including latent Dirichlet allocation (LDA).
A general Principal Orthogonal complEment Thresholding (POET) framework for large-scale covariance matrix estimation based on the approximate factor model is proposed and allows to exploit conditional sparsity covariance structure for the heavy-tailed data.
On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fPCA
This work provides a unified analysis of the properties of the sample covariance matrix $\Sigma_n$ over the class of $p\times p$ population covariance matrices $\Sigma$ of reduced effective rank
Model assisted variable clustering: Minimax-optimal recovery and algorithms
The class of G-block covariance models are introduced as a background model for variable clustering, and the analysis of the PECOK algorithm, which is based on a corrected convex relaxation of the popular K-means algorithm, provides the first statistical analysis of such algorithms forVariable clustering.
Learning Binary Latent Variable Models: A Tensor Eigenpair Approach
This paper proposes a novel spectral approach to latent variable models with hidden binary units based on the eigenvectors of both the second order moment matrix and third order moment tensor of the observed data, and proves that under mild non-degeneracy conditions, the method consistently estimates the model parameters at the optimal parametric rate.