• Corpus ID: 7725308

On the consistency theory of high dimensional variable screening

@article{Wang2015OnTC,
  title={On the consistency theory of high dimensional variable screening},
  author={Xiangyu Wang and Chenlei Leng and David B. Dunson},
  journal={ArXiv},
  year={2015},
  volume={abs/1502.06895}
}
Variable screening is a fast dimension reduction technique for assisting high dimensional feature selection. As a preselection method, it selects a moderate size subset of candidate variables for further refining via feature selection to produce the final model. The performance of variable screening depends on both computational efficiency and the ability to dramatically reduce the number of variables without discarding the important ones. When the data dimension $p$ is substantially larger… 

A Generic Sure Independence Screening Procedure

TLDR
A generic nonparametric sure independence screening procedure, called BCor-SIS, is proposed on the basis of a recently developed universal dependence measure: Ball correlation, which shows strong screening consistency even when the dimensionality is an exponential order of the sample size without imposing sub-exponential moment assumptions on the data.

A generalized knockoff procedure for FDR control in structural change detection

On the support recovery of marginal regression.

TLDR
This work identifies the underlying factors---which the authors denote as \emph{MR incoherence}---affecting MR's support recovery performance, and provides a much more nuanced and optimistic view of MR in comparison to previous works.

DECOrrelated feature space partitioning for distributed sparse regression

TLDR
By incorporating the decorrelation step, DECO can achieve consistent variable selection and parameter estimation on each subset with (almost) no assumptions and the convergence rate is nearly minimax optimal for both sparse and weakly sparse models and does NOT depend on the partition number.

No penalty no tears: Least squares in high-dimensional linear models

TLDR
This work advocates the use of a generalized version of OLS motivated by ridge regression, and proposes two novel three-step algorithms involving least squares fitting and hard thresholding for problems with dimensionality larger than the sample size.

M-Mix: Generating Hard Negatives via Multi-sample Mixing for Contrastive Learning

TLDR
This work proposes M-Mix, which dynamically generates a sequence of hard negatives through pairwise mixup operation in vision, and achieves state-of-the-art performance under self-supervised settings.

Distributed Feature Selection in Large n and Large p Regression Problems

Distributed Feature Selection in Large n and Large p Regression Problems by Xiangyu Wang Department of Statistical Science Duke University Date: Approved: David B. Dunson, Supervisor

References

SHOWING 1-10 OF 23 REFERENCES

High dimensional ordinary least squares projection for screening variables

TLDR
It is shown that HOLP has the sure screening property and gives consistent variable selection without the strong correlation assumption, and it has a low computational complexity.

Robust rank correlation based screening

Independence screening is a variable selection method that uses a ranking criterion to select significant variables, particularly for statistical models with nonpolynomial dimensionality or "large p,

Forward Regression for Ultra-High Dimensional Variable Screening

TLDR
The theoretical analysis reveals that FR can identify all relevant predictors consistently, even if the predictor dimension is substantially larger than the sample size, if the dimension of the true model is finite.

High dimensional variable selection via tilting

TLDR
The proposed tilting procedure provides an adaptive choice between the use of marginal correlation and tilted correlation for each variable, where the choice is made depending on the values of the hard thresholded sample correlation of the design matrix.

On Model Selection Consistency of Lasso

TLDR
It is proved that a single condition, which is called the Irrepresentable Condition, is almost necessary and sufficient for Lasso to select the true model both in the classical fixed p setting and in the large p setting as the sample size n gets large.

The sparsity and bias of the Lasso selection in high-dimensional linear regression

Meinshausen and Buhlmann [Ann. Statist. 34 (2006) 1436-1462] showed that, for neighborhood selection in Gaussian graphical models, under a neighborhood stability condition, the LASSO is consistent,

Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties

TLDR
In this article, penalized likelihood approaches are proposed to handle variable selection problems, and it is shown that the newly proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well if the correct submodel were known.

Nearly unbiased variable selection under minimax concave penalty

TLDR
It is proved that at a universal penalty level, the MC+ has high probability of matching the signs of the unknowns, and thus correct selection, without assuming the strong irrepresentable condition required by the LASSO.

On model selection consistency of M-estimators with geometrically decomposable penalties

TLDR
A general framework for establishing consistency and model selection consistency of M-estimators with geometrically decomposable penalties is developed and results for some special cases of interest in bioinformatics and statistical learning are derived.

Sharp thresholds for high-dimensional and noisy recovery of sparsity

TLDR
This work analyzes the behavior of $\ell_1$-constrained quadratic programming (QP), also referred to as the Lasso, for recovering the sparsity pattern and establishes a sharp relation between the problem dimension $\mdim$ and the number of observations that are required for reliable recovery.