• Corpus ID: 251403172

Sparse semiparametric discriminant analysis for high-dimensional zero-inflated data

  title={Sparse semiparametric discriminant analysis for high-dimensional zero-inflated data},
  author={Hee Cheol Chung and Yang Ni and Irina Gaynanova},
Sequencing-based technologies provide an abundance of high-dimensional biological datasets with skewed and zero-inflated measurements. Classification of such data with linear discriminant analysis leads to poor performance due to the violation of the Gaussian distribution assumption. To address this limitation, we propose a new semiparametric discriminant analysis framework based on the truncated latent Gaussian copula model that accommodates both skewness and zero inflation. By applying… 



Sparse semiparametric canonical correlation analysis for data of mixed types.

This work proposes a new approach for sparse canonical correlation analysis of mixed data types that does not require explicit parametric assumptions and uses truncated latent Gaussian copula to model the data with excess zeroes to derive a rank-based estimator of latent correlation matrix.

Trace Ratio Optimization for High-Dimensional Multi-Class Discrimination

Empirical examples with simulated and real datasets suggest that the proposed method works well and is often better than some existing approaches in a wide range of problems, with respect to both variable selectivity and classification accuracy.

A direct approach to sparse discriminant analysis in ultra-high dimensions

The theory shows that the method proposed can consistently identify the subset of discriminative features contributing to the Bayes rule and at the same time consistently estimate theBayes classification direction, even when the dimension can grow faster than any polynomial order of the sample size.

High dimensional semiparametric latent graphical model for mixed data

A unified rank‐based approach to estimate the correlation matrix of latent variables is proposed and the concentration inequality of the proposed rank-based estimator is established, which achieves the same rates of convergence for precision matrix estimation and graph recovery as if the latent variables were observed.

Scale-Invariant Sparse PCA on High-Dimensional Meta-Elliptical Data

  • Fang HanHan Liu
  • Computer Science
    Journal of the American Statistical Association
  • 2014
This work proposes a semiparametric method for conducting scale-invariant sparse principal component analysis (PCA) on high-dimensional non-Gaussian data and outperforms most competing methods on both synthetic and real-world datasets.

Semiparametric Gaussian Copula Regression modeling for Mixed Data Types (SGCRM)

Semiparametric Gaussian Copula Regression modeling (SGCRM) is proposed that allows to model a joint dependence structure between observed continuous, truncated, ordinal, and binary variables and to construct conditional models with these four data types as outcomes with a guarantee that derived conditional models are mutually consistent.

Statistical Learning with Sparsity: The Lasso and Generalizations

Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data and extract useful and reproducible patterns from big datasets.

High-dimensional Mixed Graphical Model with Ordinal Data: Parameter Estimation and Statistical Inference

A flexible model called Latent Mixed Gaussian Copula Model is proposed that simultaneously deals with such mixed data by assuming that the observed ordinal variables are generated by latent variables.

Discriminant analysis through a semiparametric model

We consider a semiparametric generalisation of normal-theory discriminant analysis. The semiparametric model assumes that, after unspecified univariate monotone transformations, the class

latentcor: An R Package for estimating latent correlations from mixed data types

The R package latentcor comprises a comprehensive list of semi-parametric latent Gaussian copula models, enabling the estimation of correlations between any of continuous/binary/ternary/zero-inflated (truncated) variable types, and makes latent correlation estimation readily available for modern high-throughput data analysis.