# Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints

@article{Liu2021IntegrativeHD, title={Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints}, author={Molei Liu and Yin Xia and Kelly Cho and Tianxi Cai}, journal={J. Mach. Learn. Res.}, year={2021}, volume={22}, pages={126:1-126:26} }

Identifying informative predictors in a high dimensional regression model is a critical step for association analysis and predictive modeling. Signal detection in the high dimensional setting often fails due to the limited sample size. One approach to improve power is through meta-analyzing multiple studies on the same scientific question. However, integrative analysis of high dimensional data from multiple studies is challenging in the presence of between study heterogeneity. The challenge is…

## 6 Citations

Individual Data Protected Integrative Regression Analysis of High-Dimensional Heterogeneous Data

- Computer Science
- 2019

The utility of the ADeLE procedure to derive phenotyping algorithms for coronary artery disease using electronic health records data from multiple disease cohorts and the prediction and estimation errors incurred by aggregating derived data is negligible compared to the statistical minimax rate.

Inference for Maximin Effects: A Sampling Approach to Aggregating Heterogenous High-dimensional Regression Models

- Computer Science
- 2020

A novel sampling approach is devised to construct confidence intervals for any linear contrast of maximin effects in high dimensions and a ridge-type maximin effect is introduced to balance reward optimality and statistical stability.

A Note on Debiased/Double Machine Learning Logistic Partially Linear Model

- Computer Science
- 2020

The debiased/double machine learning logistic partially linear model is derived and a model double robustness property on high dimensional ultra-sparse nuisance models is preserved to preserve the first order bias of the nuisance models.

Double/debiased machine learning for logistic partially linear model

- Computer Science
- 2020

We propose double/debiased machine learning approaches to infer (at the parametric rate) the parametric component of a logistic partially linear model with the binary response following a conditional…

Targeting Underrepresented Populations in Precision Medicine: A Federated Transfer Learning Approach

- Computer ScienceArXiv
- 2021

A two-way data integration strategy that integrates heterogeneous data from diverse populations and from multiple healthcare institutions via a federated transfer learning approach that improves the estimation and prediction accuracy in underrepresented populations, and reduces the gap of model performance across populations.

Inference for High-dimensional Maximin Effects in Heterogeneous Regression Models Using a Sampling Approach.

- Computer Science
- 2020

A novel sampling approach is devised to construct the confidence interval for any linear contrast of high-dimensional maximin effects and a ridge-type maximin effect is introduced to simultaneously account for reward optimality and statistical stability.

## References

SHOWING 1-10 OF 69 REFERENCES

Individual Data Protected Integrative Regression Analysis of High-Dimensional Heterogeneous Data

- Computer Science
- 2019

The utility of the ADeLE procedure to derive phenotyping algorithms for coronary artery disease using electronic health records data from multiple disease cohorts and the prediction and estimation errors incurred by aggregating derived data is negligible compared to the statistical minimax rate.

Global and Simultaneous Hypothesis Testing for High-Dimensional Logistic Regression Models

- MathematicsJournal of the American Statistical Association
- 2021

Global testing and large-scale multiple testing for the regression coefficients are considered in both single- and two-regression settings and a lower bound for the global testing is established, which shows that the proposed test is asymptotically minimax optimal over some sparsity range.

Two-Sample Tests for High-Dimensional Linear Regression with an Application to Detecting Interactions.

- MathematicsStatistica Sinica
- 2018

A procedure for testing the equality of the two regression vectors globally is proposed and shown to be particularly powerful against sparse alternatives, and a multiple testing procedure for identifying unequal coordinates while controlling the false discovery rate and false discovery proportion is introduced.

Distributed regression modeling for selecting markers under data protection constraints

- Computer Science
- 2018

A multivariable regression approach for identifying important markers by automatic variable selection based on aggregated data from different locations in iterative calls is proposed and a heuristic variant of the approach is provided to minimize the amount of transferred data and the number of calls.

DataSHIELD - shared individual-level analysis without sharing the data: a biostatistical perspective.

- Environmental Science
- 2012

This paper explains why a DataSHIELD approach yields identical results to an individual level meta-analysis in the case of a generalised linear model, by simply using summary statistics from each study.

Uniformly valid confidence intervals for conditional treatment effects in misspecified high-dimensional models.

- Mathematics, Psychology
- 2019

Eliminating the effect of confounding in observational studies typically involves fitting a model for an outcome adjusted for covariates. When, as often, these covariates are high-dimensional, this…

Joint testing and false discovery rate control in high‐dimensional multivariate regression

- MathematicsBiometrika
- 2018

A row‐wise multiple testing procedure is developed to identify the covariates associated with the responses and the procedure is shown to control the false discovery proportion and false discovery rate at a prespecified level asymptotically.

Robust-ODAL: Learning from heterogeneous health systems without sharing patient-level data

- Computer SciencePSB
- 2020

This study proposes a privacy-preserving and communication-efficient distributed algorithm which accounts for the heterogeneity caused by a small number of the clinical sites and showed that the proposed method performed better than the existing distributed algorithm ODAL and a meta-analysis method.

The Benefit of Group Sparsity in Group Inference with De-biased Scaled Group Lasso

- Mathematics
- 2014

We study confidence regions and approximate chi-squared tests for variable groups in high-dimensional linear regression. When the size of the group is small, low-dimensional projection estimators for…

High-dimensional econometrics and regularized GMM

- Computer Science, Mathematics
- 2018

This chapter presents key concepts and theoretical results for analyzing estimation and inference in high-dimensional models, and presents results in a framework where estimators of parameters of interest may be represented directly as approximate means.