Convex hierarchical testing of interactions

  title={Convex hierarchical testing of interactions},
  author={Jacob Bien and Noah Simon and Robert Tibshirani},
  journal={The Annals of Applied Statistics},
We consider the testing of all pairwise interactions in a two-class problem with many features. We devise a hierarchical testing framework that considers an interaction only when one or more of its constituent features has a nonzero main effect. The test is based on a convex optimization framework that seamlessly considers main effects and interactions together. We show—both in simulation and on a genomic dataset from the SAPPHIRe study—a potential gain in power and interpretability over a… 

Figures and Tables from this paper

Interaction screening by partial correlation
This paper proposes a main-effect-adjusted interaction screening procedure to select interactions while taking into account main efieds, and develops Efficient algorithms developed for each correlation measure to make the screening procedure scalable to high dimensional data.
Lasso for hierarchical polynomial models
This work uses the principle of divisibility conditions implicit in polynomial hierarchy to derive versions of strong and weak hierarchy and to extend existing work in the literature, which at the moment is only concerned with models of degree two.
Penalized Interaction Estimation for Ultrahigh Dimensional Quadratic Regression
This article introduces a novel method which allows us to estimate the main effects and interactions separately in high dimensional quadratic regression, and develops an efficient ADMM algorithm to implement the penalized estimation.
Identification of gene–environment interactions with marginal penalization
A marginal penalization approach is proposed which adopts a novel penalty to directly tackle the aforementioned problems and outperforms the popular significance‐based analysis and simple penalization‐based alternatives.
Absolute Fused Lasso and Its Application to Genome-Wide Association Studies
Empirical studies on both synthetic and real-world data sets from Genome-Wide Association Studies demonstrate the efficiency and effectiveness of the proposed regularized model in simultaneous identifying important features and grouping similar features together.
A flexible model-free prediction-based framework for feature ranking
This work proposes two ranking criteria corresponding to two prediction objectives: the classical criterion (CC) and the Neyman-Pearson criterion (NPC), both of which use model-free nonparametric implementations to accommodate diverse feature distributions.
An Efficient Algorithm For Weak Hierarchical Lasso
This article proposes to directly solve the non-convex weak hierarchical Lasso by making use of the General Iterative Shrinkage and Thresholding (GIST) optimization framework, which has been shown to be efficient for solving non- Convex sparse formulations.
Comparison of Correlation, Partial Correlation, and Conditional Mutual Information for Interaction Effects Screening in Generalized Linear Models
Numerous screening techniques have been developed in recent years for genome-wide association studies (GWASs) (Moore et al., 2010). In this thesis, a novel model-free screening method was developed
Gene–environment interaction identification via penalized robust divergence
Robust methods based on γ$\gamma$ ‐divergence and density power divergence are proposed to accommodate contaminated data/long‐tailed distributions and can significantly outperform the existing alternatives with more accurate identification.
Neyman-Pearson Criterion (NPC): A Model Selection Criterion for Asymmetric Binary Classification
A real data case study of breast cancer suggests that the Neyman-Pearson criterion is a practical criterion that leads to the discovery of novel gene markers with both high sensitivity and specificity for breast cancer diagnosis.


A Permutation Approach to Testing Interactions in Many Dimensions
A permutation-based method for testing marginal interactions with a binary response that finds apparent signal and tells a believable story, while logistic regression does not and gives asymptotic consistency results under not too restrictive assumptions.
A precise characterization of the effect of this hierarchy constraint is given, a bound on this estimate reveals the amount of fitting "saved" by the hierarchy constraint, and it is proved that hierarchy holds with probability one.
Penalized logistic regression for detecting gene interactions.
This work proposes using a variant of logistic regression with (L)_(2)-regularization to fit gene-gene and gene-environment interaction models and demonstrates that this method outperforms other methods in the identification of the interaction structures as well as prediction accuracy.
Permutation and Parametric Bootstrap Tests for Gene–Gene and Gene–Environment Interactions
It is shown that in genetic association studies it is not typically possible to construct exact permutation tests of gene‐gene or gene‐environment interaction hypotheses, and an alternative to the permutation approach in testing for interaction, a parametric bootstrap approach is described.
Increasing the power of identifying gene × gene interactions in genome‐wide association studies
It is found that for most plausible interaction effects a two‐stage analysis can dramatically increase the power to identify interactions compared to a single-stage analysis based on simulation studies using known genetic models and data from existing genome‐wide association studies.
Powerful Cocktail Methods for Detecting Genome‐Wide Gene‐Environment Interaction
This article presents a module‐based approach to integrating various methods that exploits each method's most appealing aspects and develops two novel “cocktail” methods for genome‐wide detection of gene‐environment interactions.
Multiple Testing Procedures with Applications to Genomics
This chapter discusses single-Step Multiple Testing Procedures for Controlling General Type I Error Rates, as well asmentation and resampling-Based Empirical Bayes multiple testing procedures forcontrolling Generalized Tail Probability Error Rates.
Large-scale inference
This book discusses empirical Bayes and the James-Stein estimator, as well as large-scale hypothesis testing algorithms, and prediction and effect size estimation.
Statistical Power of Model Selection Strategies for Genome-Wide Association Studies
A novel statistical approach for power calculation is developed, accurate formulas for the power of different model selection strategies are derived, and the formulas are utilized to evaluate and compare these strategies in genetic model spaces.
Tree-structured supervised learning and the genetics of hypertension.
  • Jing HuangA. Lin R. Olshen
  • Computer Science
    Proceedings of the National Academy of Sciences of the United States of America
  • 2004
An algorithm for general supervised learning that extends the binary tree-structured approach although it differs greatly in its selection and combination of predictors, FlexTree seems better than the other technologies in terms of Bayes risk.