• Corpus ID: 88519297

Two-Sample Testing in High-Dimensional Models

@article{Stadler2012TwoSampleTI,
  title={Two-Sample Testing in High-Dimensional Models},
  author={Nicolas Stadler and Sach Mukherjee},
  journal={arXiv: Methodology},
  year={2012}
}
We propose novel methodology for testing equality of model parameters between two high-dimensional populations. The technique is very general and applicable to a wide range of models. The method is based on sample splitting: the data is split into two parts; on the first part we reduce the dimensionality of the model to a manageable size; on the second part we perform significance testing (p-value calculation) based on a restricted likelihood ratio statistic. Assuming that both populations… 

A global homogeneity test for high-dimensional linear regression

TLDR
A testing procedure is developed that applies to high-dimensional settings where the number of covariates p is larger than thenumber of observations n 1 and n 2 of the two samples and is proved to be minimax adaptive to the sparsity.

TWO-SAMPLE TESTING OF HIGH-DIMENSIONAL LINEAR REGRESSION COEFFICIENTS VIA COMPLEMENTARY SKETCHING

TLDR
A new method for two-sample testing of high-dimensional linear regression coefficients without assuming that those coefs are in-dividually estimable is introduced, which is shown to have essentially optimal asymptotic power under a Gaussian design.

Two-sample testing of high-dimensional linear regression coefficients via complementary sketching.

We introduce a new method for two-sample testing of high-dimensional linear regression coefficients without assuming that those coefficients are individually estimable. The procedure works by first

LOCALIZING DIFFERENTIALLY EVOLVING COVARIANCE STRUCTURES VIA SCAN STATISTICS.

TLDR
This work first gives a parametric model for estimating trends in the space of SPD matrices as a function of one or more covariates, and generalizes scan statistics to graph structures, to search over distinct subsets of features whose temporal dependency structure may show statistically significant group-wise differences.

High-dimensional regression over disease subgroups

TLDR
This work is motivated by biomedical problems, where disease subtypes, for example, may differ with respect to underlying regression models, but sample sizes at the subgroup-level may be limited, and aims to treat subgroups as related problem instances and jointly estimate sub group-specific regression coefficients.

Finding Differentially Covarying Needles in a Temporally Evolving Haystack: A Scan Statistics Perspective

TLDR
This work first gives a parametric model for estimating trends in the space of SPD matrices as a function of one or more covariates, and generalizes scan statistics to graph structures, to search over distinct subsets of features whose temporal dependency structure may show statistically significant group-wise differences.

Multivariate gene-set testing based on graphical models.

TLDR
This paper proposes a novel approach for gene-set analysis that allows for truly multivariate hypotheses, in particular differences in gene-gene networks between conditions, and shows results using high-throughput data from several studies in cancer biology.

Network-based multivariate gene-set testing

TLDR
A novel approach for gene-set analysis that allows for truly multivariate hypotheses, in particular differences in gene-gene networks between conditions, is proposed.

Confidence intervals for high-dimensional inverse covariance estimation

We propose methodology for statistical inference for low-dimensional parameters of sparse precision matrices in a high-dimensional setting. Our method leads to a non-sparse estimator of the precision

Molecular heterogeneity at the network level: high-dimensional testing, clustering and a TCGA case study

TLDR
Recent ideas from high‐dimensional statistics for testing and clustering in the network biology setting are leveraged and can be applied directly to most continuous molecular measurements and networks do not need to be specified beforehand.

References

SHOWING 1-10 OF 24 REFERENCES

A More Powerful Two-Sample Test in High Dimensions using Random Projection

TLDR
This work proposes a new test statistic for the two-sample test of means that integrates a random projection with the classical Hotelling T2 statistic, and derives an asymptotic power function for this test, and demonstrates superior performance against competing tests in the parameter regimes anticipated by the theoretical results.

p-Values for High-Dimensional Regression

TLDR
Inference across multiple random splits can be aggregated while maintaining asymptotic control over the inclusion of noise variables, and it is shown that the resulting p-values can be used for control of both family-wise error and false discovery rate.

EFFECT OF HIGH DIMENSION: BY AN EXAMPLE OF A TWO SAMPLE PROBLEM

With the rapid development of modern computing techniques, statisticians are dealing with data with much higher dimension. Consequently, due to their loss of accuracy or power, some classical

Two-Sample Covariance Matrix Testing and Support Recovery in High-Dimensional and Sparse Settings

TLDR
A new test for testing the hypothesis H 0 is proposed and investigated to enjoy certain optimality and to be especially powerful against sparse alternatives and applications to gene selection are discussed.

Statistical significance in high-dimensional linear models

TLDR
This work proposes a method for constructing p -values for general hypotheses in a high-dimensional linear model based on Ridge estimation with an additional correction term due to a substantial projection bias in high dimensions.

HIGH DIMENSIONAL VARIABLE SELECTION.

TLDR
This paper looks at the error rates and power of some multi-stage regression methods and considers three screening methods: the lasso, marginal regression, and forward stepwise regression.

Variable Selection via Penalized Likelihood

TLDR
The sieve likelihood ratio statistics are shown to be general and powerful for nonparametric inferences based on function estimation and can be adaptively optimal in the sense of Spokoiny (1996) by using a simple choice of adaptive smoothing parameter.

Two Sample Tests for High Dimensional Covariance Matrices

TLDR
Two tests for the equality of covariance matrices between two high-dimensional populations are proposed which surpass the capability of the conventional likelihood ratio test and can be used to test on covariances associated with gene ontology terms.

Model selection and estimation in regression with grouped variables

Summary.  We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor

Regression Shrinkage and Selection via the Lasso

TLDR
A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.