# Detecting approximate replicate components of a high-dimensional random vector with latent structure

@article{Bing2020DetectingAR, title={Detecting approximate replicate components of a high-dimensional random vector with latent structure}, author={Xin Bing and Florentina Bunea and Marten H. Wegkamp}, journal={arXiv: Methodology}, year={2020} }

High-dimensional feature vectors are likely to contain sets of measurements that are approximate replicates of one another. In complex applications, or automated data collection, these feature sets are not known a priori, and need to be determined. This work proposes a class of latent factor models on the observed high-dimensional random vector $X \in \mathbb{R}^p$, for defining, identifying and estimating the index set of its approximately replicate components. The model class is parametrized…

## Figures and Tables from this paper

## One Citation

Inference in latent factor regression with clusterable features

- 2021

Regression models, in which the observed features X ∈ R and the response Y ∈ R depend, jointly, on a lower dimensional, unobserved, latent vector Z ∈ R , with K p, are popular in a large array of…

## References

SHOWING 1-10 OF 52 REFERENCES

Adaptive estimation in structured factor models with applications to overlapping clustering

- Mathematics
- 2017

This work introduces a novel estimation method, called LOVE, of the entries and structure of a loading matrix A in a sparse latent factor model X = AZ + E, for an observable random vector X in Rp,…

Optimal estimation of sparse topic models

- Mathematics, Computer ScienceJ. Mach. Learn. Res.
- 2020

Empirical results show that the proposed estimator is a strong competitor of the existing state-of-the-art algorithms for both non-sparse A$ and sparse A, and has superior performance is many scenarios of interest.

Tensor decompositions for learning latent variable models

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2014

A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices, and implies a robust and computationally tractable estimation approach for several popular latent variable models.

Concentration of measure and spectra of random matrices: Applications to correlation matrices, elliptical distributions and beyond

- Mathematics
- 2009

We place ourselves in the setting of high-dimensional statistical inference, where the number of variables $p$ in a data set of interest is of the same order of magnitude as the number of…

Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions

- Mathematics, Computer ScienceICML
- 2011

A general theorem is derived that bounds the Frobenius norm error for an estimate of the pair of high-dimensional matrix decomposition problems obtained by solving a convex optimization problem that combines the nuclear norm with a general decomposable regularizer.

A Spectral Algorithm for Latent Dirichlet Allocation

- Computer Science, MathematicsAlgorithmica
- 2014

This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of multi-view models and topic models, including latent Dirichlet allocation (LDA).

LARGE COVARIANCE ESTIMATION THROUGH ELLIPTICAL FACTOR MODELS.

- Medicine, MathematicsAnnals of statistics
- 2018

A general Principal Orthogonal complEment Thresholding (POET) framework for large-scale covariance matrix estimation based on the approximate factor model is proposed and allows to exploit conditional sparsity covariance structure for the heavy-tailed data.

On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fPCA

- Mathematics
- 2012

This work provides a unified analysis of the properties of the sample covariance matrix $\Sigma_n$ over the class of $p\times p$ population covariance matrices $\Sigma$ of reduced effective rank…

Model assisted variable clustering: Minimax-optimal recovery and algorithms

- Mathematics
- 2015

Model-based clustering defines population level clusters relative to a model that embeds notions of similarity. Algorithms tailored to such models yield estimated clusters with a clear statistical…

Learning Binary Latent Variable Models: A Tensor Eigenpair Approach

- Mathematics, Computer ScienceICML
- 2018

This paper proposes a novel spectral approach to latent variable models with hidden binary units based on the eigenvectors of both the second order moment matrix and third order moment tensor of the observed data, and proves that under mild non-degeneracy conditions, the method consistently estimates the model parameters at the optimal parametric rate.