False Discovery Rate Control via Debiased Lasso

  title={False Discovery Rate Control via Debiased Lasso},
  author={Adel Javanmard and Hamid Javadi},
We consider the problem of variable selection in high-dimensional statistical models where the goal is to report a set of variables, out of many predictors $X_1, \dotsc, X_p$, that are relevant to a response of interest. For linear high-dimensional model, where the number of parameters exceeds the number of samples $(p>n)$, we propose a procedure for variables selection and prove that it controls the \emph{directional} false discovery rate (FDR) below a pre-assigned significance level $q\in [0… 

Figures from this paper

Relaxing the assumptions of knockoffs by conditioning

The recent paper Cand\`es et al. (2018) introduced model-X knockoffs, a method for variable selection that provably and non-asymptotically controls the false discovery rate with no restrictions or

Directional FDR Control for Sub-Gaussian Sparse GLMs

It is shown that the proposed debiased statistics can asymptotically control the directional (sign) FDR and directional false discovery variables at a pre-specified significance level for two-sample problems.

A Scale-free Approach for False Discovery Rate Control in Generalized Linear Models

The generalized linear models (GLM) have been widely used in practice to model non-Gaussian response variables. When the number of explanatory features is relatively large, scientific researchers are

Online Debiasing for Adaptively Collected High-dimensional Data

It is demonstrated that online debiasing optimally debiases the LASSO estimate when the underlying parameter $\theta_0$ has sparsity of order $o(\sqrt{n}/\log p)$.

False Discovery Rate Control Under General Dependence By Symmetrized Data Aggregation

The proposed SDA filter first constructs a sequence of ranking statistics that fulfill global symmetry properties, and then chooses a data--driven threshold along the ranking to control the FDR, and establishes the asymptotic validity of SDA for both the FDR and false discovery proportion (FDP) control under mild regularity conditions.

Stepdown SLOPE for Controlled Feature Selection

Two new SLOPEs are proposed to realize control of high-dimensional feature selection by adaptively imposing the non-increasing sequence of tuning parameters on the sorted L-One Penalized Estimation by considering the stepdown-based SLOPE to control the probability of false rejections and false discovery proportion.

Inference in Sparsity-Induced Weak Factor Models

Abstract In this article, we consider statistical inference for high-dimensional approximate factor models. We posit a weak factor structure, in which the factor loading matrix can be sparse and the

Global and Simultaneous Hypothesis Testing for High-Dimensional Logistic Regression Models

Global testing and large-scale multiple testing for the regression coefficients are considered in both single- and two-regression settings and a lower bound for the global testing is established, which shows that the proposed test is asymptotically minimax optimal over some sparsity range.

Inference in Weak Factor Models

In this paper, we consider statistical inference for high-dimensional approximate factor models. We posit a weak factor structure, in which the factor loading matrix can be sparse and the signal

Power of FDR Control Methods: The Impact of Ranking Algorithm, Tampered Design, and Symmetric Statistic

The Rare/Weak signal model is adopted, popular in multiple testing and variable selection literature, and the rate of convergence of the number of false positives and the numberof false negatives of FDR control methods for particular classes of designs is characterized.



Debiasing the lasso: Optimal sample size for Gaussian designs

It is proved that the debiased estimator is asymptotically Gaussian under the nearly optimal condition $s_0 = o(n/ (\log p)^2)$, and a new estimator that is minimax optimal up to a factor $1+o_n(1)$ for i.i.d. Gaussian designs.

Phase Transition and Regularized Bootstrap in Large Scale $t$-tests with False Discovery Rate Control

Applying Benjamini and Hochberg (B-H) method to multiple Student's $t$ tests is a popular technique in gene selection in microarray data analysis. Because of the non-normality of the population, the

Robust inference with knockoffs

The results, which are free of any modeling assumption whatsoever, show that the resulting model selection procedure incurs an inflation of the false discovery rate that is proportional to the authors' errors in estimating the distribution of each feature $X_j$ conditional on the remaining features $\{X_k:k\neq j\}$.

Controlling the false discovery rate via knockoffs

The knockoff filter is introduced, a new variable selection procedure controlling the FDR in the statistical linear model whenever there are at least as many observations as variables, and empirical results show that the resulting method has far more power than existing selection rules when the proportion of null variables is high.

Control of the False Discovery Rate Under Arbitrary Covariance Dependence

This paper derives the theoretical distribution for false discovery proportion (FDP) in large scale multiple testing when a common threshold is used and provides a consistent FDP, and proposes a new methodology based on principal factor approximation, which successfully substracts the common dependence and weakens signicantly the correlation structure.

Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition

This paper studies the `Gauss-Lasso' selector, a simple two-stage method that first solves the Lasso, and then performs ordinary least squares restricted to theLasso active set, and forms the `generalized ir Representability condition' (GIC), an assumption that is substantially weaker than irrepresentability.

Nearly optimal sample size in hypothesis testing for high-dimensional regression

  • Adel JavanmardA. Montanari
  • Computer Science, Mathematics
    2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton)
  • 2013
This work proposes a special debiasing method that is well suited for random designs with sparse inverse covariance and yields nearly optimal average testing power if sample size n asymptotically dominates s0(logp)2, with s0 being the sparsity level (number of non-zero coefficients).

A knockoff filter for high-dimensional selective inference

It is proved that the high-dimensional knockoff procedure 'discovers' important variables as well as the directions (signs) of their effects, in such a way that the expected proportion of wrongly chosen signs is below the user-specified level.

RANK: Large-Scale Inference With Graphical Nonlinear Knockoffs

It is established that under mild regularity conditions, the power of the oracle knockoffs procedure with known covariate distribution in high-dimensional linear models is asymptotically one as sample size goes to infinity.

On Model Selection Consistency of Lasso

It is proved that a single condition, which is called the Irrepresentable Condition, is almost necessary and sufficient for Lasso to select the true model both in the classical fixed p setting and in the large p setting as the sample size n gets large.