Xiaotong Shen

Learn More
Variable selection in clustering analysis is both challenging and important. In the context of modelbased clustering analysis with a common diagonal covariance matrix, which is especially suitable for “high dimension, low sample size” settings, we propose a penalized likelihood approach with an L1 penalty function, automatically realizing variable selection(More)
High-dimensional regression/classification continues to be an important and challenging problem, especially when features are highly correlated. Feature selection, combined with additional structure information on the features has been considered to be promising in promoting regression/classification performance. Graph-guided fused lasso (GFlasso) has(More)
In binary classification, margin-based techniques usually deliver high performance. As a result, a multicategory problem is often treated as a sequence of binary classifications. In the absence of a dominating class, this treatment may be suboptimal and may yield poor performance, such as for support vector machines (SVMs). We propose a novel multicategory(More)
The importance of network-based approach to identifying biological markers for diagnostic classification and prognostic assessment in the context of microarray data has been increasingly recognized. To our knowledge, there have been few, if any, statistical tools that explicitly incorporate the prior information of gene networks into classifier building.(More)
Many margin-based binary classification techniques such as support vector machine (SVM) andψ-learning deliver high performance. An earlier article proposed a new multicategory ψ-learning methodology that shows great promise in generalization ability. However, ψ-learning is computationally difficult because it requires handling a nonconvex minimization(More)
In high-dimensional data analysis, feature selection becomes one means for dimension reduction, which proceeds with parameter estimation. Concerning accuracy of selection and estimation, we study nonconvex constrained and regularized likelihoods in the presence of nuisance parameters. Theoretically, we show that constrained L(0)-likelihood and its(More)
Extracting grouping structure or identifying homogenous subgroups of predictors in regression is crucial for high-dimensional data analysis. A low-dimensional structure in particular-grouping, when captured in a regression model, enables to enhance predictive performance and to facilitate a model's interpretability Grouping pursuit extracts homogenous(More)
Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying clustering structures. Hence removing noise variables via variable selection is necessary. For simultaneous variable selection(More)
We consider penalized linear regression, especially for "large p, small n" problems, for which the relationships among predictors are described a priori by a network. A class of motivating examples includes modeling a phenotype through gene expression profiles while accounting for coordinated functioning of genes in the form of biological pathways or(More)