Individualized Multidirectional Variable Selection

  title={Individualized Multidirectional Variable Selection},
  author={Xiwei Tang and Fei Xue and Annie Qu},
  journal={Journal of the American Statistical Association},
  pages={1280 - 1296}
  • Xiwei Tang, F. Xue, A. Qu
  • Published 15 September 2017
  • Computer Science
  • Journal of the American Statistical Association
ABSTRACT In this article, we propose a heterogeneous modeling framework which achieves individual-wise feature selection and heterogeneous covariates’ effects subgrouping simultaneously. In contrast to conventional model selection approaches, the new approach constructs a separation penalty with multidirectional shrinkages, which facilitates individualized modeling to distinguish strong signals from noisy ones and selects different relevant variables for different individuals. Meanwhile, the… 

Pursuing sources of heterogeneity in modeling clustered population

Three applications are presented, namely, an imaging genetics study for linking genetic factors and brain neuroimaging traits in Alzheimer's disease, a public health study for exploring the association between suicide risk among adolescents and their school district characteristics, and a sport analytics study for understanding how the salary levels of baseball players are associated with their performance and contractual status.

Structure learning via unstructured kernel-based M-regression

A general and novel framework for recovering true structures of target functions by using unstructured M-regression in a reproducing kernel Hilbert space (RKHS), inspired by the fact that gradient functions can be employed as a valid tool to learn underlying structures.

Heterogeneous Variable Selection in Nonlinear Panel Data Models: A Semi-Parametric Bayesian Approach

In an empirical application, it is found that accounting for heterogeneous variable selection and non-normality of the continuous heterogeneity leads to an improved in-sample and out-of-sample performance and interesting insights.

A Tree-based Model Averaging Approach for Personalized Treatment Effect Estimation from Heterogeneous Data Sources

A tree-based model averaging approach to improve the estimation accuracy of conditional average treatment effects (CATE) at a target site by leveraging models derived from other potentially heterogeneous sites, without them sharing subject-level data.

Grouped GEE Analysis for Longitudinal Data

Generalized estimating equation (GEE) is widely adopted for regression modeling for longitudinal data, taking account of potential correlations within the same subjects. Although the standard GEE

Query-augmented Active Metric Learning

An active metric learning method for clustering with pairwise constraints that evaluates the information gain of instance pairs more accurately by incorporating the neighborhood structure, which improves clustering efficiency without extra labeling cost is proposed.

Heterogeneous Mediation Analysis on Epigenomic PTSD and Traumatic Stress in a Predominantly African American Cohort

DNA methylation (DNAm) has been suggested to play a critical role in post-traumatic stress disorder (PTSD), through mediating the relationship between trauma and PTSD. However, this underlying

Community Detection in General Hypergraph via Graph Embedding

The proposed method introduces a null vertex to augment a non-uniform hypergraph into a uniform multi-hypergraph, and then embeds the multi- hypergraph in a low-dimensional vector space such that vertices within the same community are close to each other.

Directed Community Detection With Network Embedding

Community detection in network data aims at grouping similar nodes sharing certain characteristics together. Most existing methods focus on detecting communities in undirected networks, where simil...



A Concave Pairwise Fusion Approach to Subgroup Analysis

A penalized approach for subgroup analysis based on a regression model, in which heterogeneity is driven by unobserved latent factors and thus can be represented by using subject-specific intercepts, is proposed.

Mixture Modeling for Longitudinal Data

In this article, we propose an unbiased estimating equation approach for a two-component mixture model with correlated response data. We adapt the mixture-of-experts model and a generalized linear

Fused Lasso Approach in Regression Coefficients Clustering - Learning Parameter Heterogeneity in Data Integration

This paper proposes a regularized fusion method that allows us to identify and merge inter-study homogeneous parameter clusters in regression analysis, without the use of hypothesis testing approach, and establishes a computationally efficient procedure to deal with large-scale integrated data.

Cluster analysis of longitudinal profiles with subgroups

: In this paper, we cluster profiles of longitudinal data using a penalized regression method. Specifically, we allow heterogeneous variation of longitudinal patterns for each subject, and utilize a

Grouping Pursuit Through a Regularization Solution Surface

A novel homotopy method for computing an entire solution surface through regularization involving a piecewise linear penalty permits adaptive grouping and nearly unbiased estimation, which is treated with a novel concept of grouped subdifferentials and difference convex programming for efficient computation.

Penalized Model-Based Clustering with Application to Variable Selection

A penalized likelihood approach with an L1 penalty function is proposed, automatically realizing variable selection via thresholding and delivering a sparse solution in model-based clustering analysis with a common diagonal covariance matrix.

Model selection and estimation in regression with grouped variables

Summary.  We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor

Longitudinal clustering for heterogeneous binary data

This paper proposes a pairwise subgrouping approach to identify subgroups and categorize similar marketing effects into groups and establishes the consistency of subgroup identification in the sense that the true underlying segmentation structure can be recovered successfully, in addition to parameter estimation consistency.

Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties

In this article, penalized likelihood approaches are proposed to handle variable selection problems, and it is shown that the newly proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well if the correct submodel were known.

Penalized Generalized Estimating Equations for High‐Dimensional Longitudinal Data Analysis

A penalized generalized estimating equations procedure for analyzing longitudinal data with high‐dimensional covariates, which often arise in microarray experiments and large‐scale health studies, and one important feature of the new procedure is that the consistency of model selection holds even if the working correlation structure is misspecified.