# Two-Sample Testing in High-Dimensional Models

@article{Stadler2012TwoSampleTI, title={Two-Sample Testing in High-Dimensional Models}, author={Nicolas Stadler and Sach Mukherjee}, journal={arXiv: Methodology}, year={2012} }

We propose novel methodology for testing equality of model parameters between two high-dimensional populations. The technique is very general and applicable to a wide range of models. The method is based on sample splitting: the data is split into two parts; on the first part we reduce the dimensionality of the model to a manageable size; on the second part we perform significance testing (p-value calculation) based on a restricted likelihood ratio statistic. Assuming that both populations…

## Figures and Tables from this paper

## 14 Citations

### A global homogeneity test for high-dimensional linear regression

- Computer Science, Mathematics
- 2013

A testing procedure is developed that applies to high-dimensional settings where the number of covariates p is larger than thenumber of observations n 1 and n 2 of the two samples and is proved to be minimax adaptive to the sparsity.

### TWO-SAMPLE TESTING OF HIGH-DIMENSIONAL LINEAR REGRESSION COEFFICIENTS VIA COMPLEMENTARY SKETCHING

- Mathematics, Computer Science
- 2022

A new method for two-sample testing of high-dimensional linear regression coefﬁcients without assuming that those coefs are in-dividually estimable is introduced, which is shown to have essentially optimal asymptotic power under a Gaussian design.

### Two-sample testing of high-dimensional linear regression coefficients via complementary sketching.

- Mathematics
- 2020

We introduce a new method for two-sample testing of high-dimensional linear regression coefficients without assuming that those coefficients are individually estimable. The procedure works by first…

### LOCALIZING DIFFERENTIALLY EVOLVING COVARIANCE STRUCTURES VIA SCAN STATISTICS.

- Computer ScienceQuarterly of applied mathematics
- 2019

This work first gives a parametric model for estimating trends in the space of SPD matrices as a function of one or more covariates, and generalizes scan statistics to graph structures, to search over distinct subsets of features whose temporal dependency structure may show statistically significant group-wise differences.

### High-dimensional regression over disease subgroups

- Computer SciencebioRxiv
- 2016

This work is motivated by biomedical problems, where disease subtypes, for example, may differ with respect to underlying regression models, but sample sizes at the subgroup-level may be limited, and aims to treat subgroups as related problem instances and jointly estimate sub group-specific regression coefficients.

### Finding Differentially Covarying Needles in a Temporally Evolving Haystack: A Scan Statistics Perspective

- Computer ScienceArXiv
- 2017

This work first gives a parametric model for estimating trends in the space of SPD matrices as a function of one or more covariates, and generalizes scan statistics to graph structures, to search over distinct subsets of features whose temporal dependency structure may show statistically significant group-wise differences.

### Multivariate gene-set testing based on graphical models.

- BiologyBiostatistics
- 2015

This paper proposes a novel approach for gene-set analysis that allows for truly multivariate hypotheses, in particular differences in gene-gene networks between conditions, and shows results using high-throughput data from several studies in cancer biology.

### Network-based multivariate gene-set testing

- Biology
- 2013

A novel approach for gene-set analysis that allows for truly multivariate hypotheses, in particular differences in gene-gene networks between conditions, is proposed.

### Confidence intervals for high-dimensional inverse covariance estimation

- Computer Science
- 2014

We propose methodology for statistical inference for low-dimensional parameters of sparse precision matrices in a high-dimensional setting. Our method leads to a non-sparse estimator of the precision…

### Molecular heterogeneity at the network level: high-dimensional testing, clustering and a TCGA case study

- Computer Science, BiologyBioinform.
- 2017

Recent ideas from high‐dimensional statistics for testing and clustering in the network biology setting are leveraged and can be applied directly to most continuous molecular measurements and networks do not need to be specified beforehand.

## References

SHOWING 1-10 OF 24 REFERENCES

### A More Powerful Two-Sample Test in High Dimensions using Random Projection

- Mathematics, Computer ScienceNIPS
- 2011

This work proposes a new test statistic for the two-sample test of means that integrates a random projection with the classical Hotelling T2 statistic, and derives an asymptotic power function for this test, and demonstrates superior performance against competing tests in the parameter regimes anticipated by the theoretical results.

### p-Values for High-Dimensional Regression

- Computer Science
- 2008

Inference across multiple random splits can be aggregated while maintaining asymptotic control over the inclusion of noise variables, and it is shown that the resulting p-values can be used for control of both family-wise error and false discovery rate.

### EFFECT OF HIGH DIMENSION: BY AN EXAMPLE OF A TWO SAMPLE PROBLEM

- Mathematics
- 1999

With the rapid development of modern computing techniques, statisticians are dealing with data with much higher dimension. Consequently, due to their loss of accuracy or power, some classical…

### Two-Sample Covariance Matrix Testing and Support Recovery in High-Dimensional and Sparse Settings

- Computer Science
- 2013

A new test for testing the hypothesis H 0 is proposed and investigated to enjoy certain optimality and to be especially powerful against sparse alternatives and applications to gene selection are discussed.

### Statistical significance in high-dimensional linear models

- Computer Science, Mathematics
- 2012

This work proposes a method for constructing p -values for general hypotheses in a high-dimensional linear model based on Ridge estimation with an additional correction term due to a substantial projection bias in high dimensions.

### HIGH DIMENSIONAL VARIABLE SELECTION.

- EconomicsAnnals of statistics
- 2009

This paper looks at the error rates and power of some multi-stage regression methods and considers three screening methods: the lasso, marginal regression, and forward stepwise regression.

### Variable Selection via Penalized Likelihood

- Mathematics, Computer Science
- 1999

The sieve likelihood ratio statistics are shown to be general and powerful for nonparametric inferences based on function estimation and can be adaptively optimal in the sense of Spokoiny (1996) by using a simple choice of adaptive smoothing parameter.

### Two Sample Tests for High Dimensional Covariance Matrices

- Mathematics, Computer Science
- 2012

Two tests for the equality of covariance matrices between two high-dimensional populations are proposed which surpass the capability of the conventional likelihood ratio test and can be used to test on covariances associated with gene ontology terms.

### Model selection and estimation in regression with grouped variables

- Mathematics
- 2006

Summary. We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor…

### Regression Shrinkage and Selection via the Lasso

- Computer Science
- 1996

A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.