BOLT-SSI: A Statistical Approach to Screening Interaction Effects for Ultra-High Dimensional Data

@article{Zhou2019BOLTSSIAS,
  title={BOLT-SSI: A Statistical Approach to Screening Interaction Effects for Ultra-High Dimensional Data},
  author={Min Zhou and Mingwei Dai and Yuan Yao and Jin Liu and Can Yang and Heng Peng},
  journal={Statistica Sinica},
  year={2019}
}
Detecting interaction effects is a crucial step in various applications. In this paper, we first propose a simple method for sure screening interactions (SSI). SSI works well for problems of moderate dimensionality, without heredity assumptions. For ultra-high dimensional problems, we propose a fast algorithm, named "BOLT-SSI". This is motivated by that the interaction effects on a response variable can be exactly evaluated using the contingency table when they are all discrete variables. The… 

High-Dimensional Interaction Detection With False Sign Rate Control

This article establishes some theoretical results on interaction selection for ultrahigh-dimensional quadratic regression models under random designs and proves that the examined method enjoys the same oracle inequalities as the lasso estimator and admits an explicit bound on the false sign rate.

Reluctant Interaction Modeling

A computationally efficient method that can solve a problem with 10 billion interactions with 5-fold cross-validation in under 7 hours on a single CPU is designed and theoretical results indicating favorable statistical properties are provided.

A flexible model-free prediction-based framework for feature ranking

This work proposes two ranking criteria corresponding to two prediction objectives: the classical criterion (CC) and the Neyman-Pearson criterion (NPC), both of which use model-free nonparametric implementations to accommodate diverse feature distributions.

Neyman-Pearson Criterion (NPC): A Model Selection Criterion for Asymmetric Binary Classification

A real data case study of breast cancer suggests that the Neyman-Pearson criterion is a practical criterion that leads to the discovery of novel gene markers with both high sensitivity and specificity for breast cancer diagnosis.

Discovering Categorical Main and Interaction Effects Based on Association Rule Mining

A method that uses association rules to select features and their interactions, then modify the algorithm for several practical concerns to show its efficiency and the results of a series of experiments verify the effectiveness of the algorithm.

Structured gene‐environment interaction analysis

Simulations and analysis of GENEVA diabetes data with SNP measurements and TCGA melanoma data with gene expression measurements demonstrate the proposed structured G‐E interaction analysis to have consistency properties under high‐dimensional settings.

References

SHOWING 1-10 OF 58 REFERENCES

A Generic Sure Independence Screening Procedure

A generic nonparametric sure independence screening procedure, called BCor-SIS, is proposed on the basis of a recently developed universal dependence measure: Ball correlation, which shows strong screening consistency even when the dimensionality is an exponential order of the sample size without imposing sub-exponential moment assumptions on the data.

Interaction pursuit in high-dimensional multi-response regression via distance correlation

A two-stage interaction identification method in the setting of high-dimensional multi-response interaction models that exploits feature screening applied to transformed variables with distance correlation followed by feature selection, called the interaction pursuit via distance correlation (IPDC).

Interaction screening by Kendall's partial correlation for ultrahigh-dimensional data with survival trait

An inverse probability-of-censoring weighted Kendall's tau statistic is proposed to measure association of a survival trait with biomarkers, as well as a Kendall's partial correlation statistic to measure the relationship of a Survival trait with an interaction variable conditional on the main effects.

Interaction Screening for Ultrahigh-Dimensional Data

Theoretically, the iFOR algorithms prove that they possess sure screening property for ultrahigh-dimensional settings, and are proposed to tackle forward-selection-based procedures called iFOR, which identify interaction effects in a greedy forward fashion while maintaining the natural hierarchical model structure.

SIS: An R Package for Sure Independence Screening in Ultrahigh-Dimensional Statistical Models

Through the publicly available R package SIS, this work provides a unified environment to carry out variable selection using iterative sure independence screening (ISIS) and all of its variants and finds considerable improvements in terms of model selection and computational time between the algorithms and traditional penalized pseudo-likelihood methods applied directly to the full set of covariates.

Robust rank correlation based screening

Independence screening is a variable selection method that uses a ranking criterion to select significant variables, particularly for statistical models with nonpolynomial dimensionality or "large p,

High-Dimensional Interaction Detection With False Sign Rate Control

This article establishes some theoretical results on interaction selection for ultrahigh-dimensional quadratic regression models under random designs and proves that the examined method enjoys the same oracle inequalities as the lasso estimator and admits an explicit bound on the false sign rate.

Ultrahigh Dimensional Feature Selection: Beyond The Linear Model

This paper extends ISIS, without explicit definition of residuals, to a general pseudo-likelihood framework, which includes generalized linear models as a special case and improves ISIS by allowing feature deletion in the iterative process.

Forward Regression for Ultra-High Dimensional Variable Screening

The theoretical analysis reveals that FR can identify all relevant predictors consistently, even if the predictor dimension is substantially larger than the sample size, if the dimension of the true model is finite.

Innovated interaction screening for high-dimensional nonlinear classification

The theory shows that the proposed method enjoys sure screening property in interaction selection in the high-dimensional setting of p growing exponentially with the sample size, and it is proved that the classification error of the procedure can be upper-bounded by the oracle classification error plus some smaller order term.
...