• Corpus ID: 218869937

Robust Sure Independence Screening for Non-polynomial dimensional Generalized Linear Models

  title={Robust Sure Independence Screening for Non-polynomial dimensional Generalized Linear Models},
  author={Abhik Ghosh and Erica Ponzi and Torkjel M Sandanger and Magne Thoresen},
  journal={arXiv: Statistics Theory},
We consider the problem of variable screening in ultra-high dimensional (of non-polynomial order) generalized linear models (GLMs). Since the popular SIS approach is extremely unstable in the presence of contamination and noises, which may frequently arise in the large scale sample data (e.g., Omics data), we discuss a new robust screening procedure based on the minimum density power divergence estimator (MDPDE) of the marginal regression coefficients. Our proposed screening procedure performs… 

Figures and Tables from this paper



A robust variable screening procedure for ultra-high dimensional data

A new robust screening procedure is developed based on the density power divergence (DPD) estimation approach and DPD-SIS and its extension iterative SIS are introduced, which are superior to both the original SIS and other robust methods when there are outliers in the data.

Sure independence screening in generalized linear models with NP-dimensionality

It is shown that the proposed methods also possess the sure screening property with vanishing false selection rate, which justifies the applicability of such a simple method in a wide spectrum.

Robust sure independence screening for ultrahigh dimensional non-normal data

A new robust sure independence screening (RoSIS) via considering the correlation between each predictor and the distribution function of the response is proposed, able to reduce ultrahigh dimensionality effectively and robust to heavy tails or extreme values in the response.

Sure independence screening for ultrahigh dimensional feature space

The concept of sure screening is introduced and a sure screening method that is based on correlation learning, called sure independence screening, is proposed to reduce dimensionality from high to a moderate scale that is below the sample size.

SIS: An R Package for Sure Independence Screening in Ultrahigh-Dimensional Statistical Models

Through the publicly available R package SIS, this work provides a unified environment to carry out variable selection using iterative sure independence screening (ISIS) and all of its variants and finds considerable improvements in terms of model selection and computational time between the algorithms and traditional penalized pseudo-likelihood methods applied directly to the full set of covariates.

A robust variable screening method for high-dimensional data

Both the simulation results and the real-life data analysis demonstrate that the proposed method can greatly control the adverse effect after detecting and removing those unusual observations, and performs better than the competing methods.

Robust rank correlation based screening

Independence screening is a variable selection method that uses a ranking criterion to select significant variables, particularly for statistical models with nonpolynomial dimensionality or "large p,

Sure Screening for Gaussian Graphical Models

The proposed Graphical sure screening, or GRASS, is a very simple and computationally-efficient screening procedure for recovering the structure of a Gaussian graphical model in the high-dimensional setting and possesses the sure screening property.

Some notes on robust sure independence screening

Two robust versions of SIS against outliers are provided, respectively, replace the sample correlation in SIS with two robust measures, and screen variables by ranking them, and are highly robust against a substantial fraction of outliers in the data.

Conditional Sure Independence Screening

The conditions under which sure screening is possible are given and an upper bound on the number of selected variables is derived and the situation under which CSIS yields model selection consistency and the properties of CSIS when a data-driven conditioning set is used is spelled out.