False Discovery Rate Control via Debiased Lasso
@article{Javanmard2018FalseDR, title={False Discovery Rate Control via Debiased Lasso}, author={Adel Javanmard and Hamid Javadi}, journal={ArXiv}, year={2018}, volume={abs/1803.04464} }
We consider the problem of variable selection in high-dimensional statistical models where the goal is to report a set of variables, out of many predictors $X_1, \dotsc, X_p$, that are relevant to a response of interest. For linear high-dimensional model, where the number of parameters exceeds the number of samples $(p>n)$, we propose a procedure for variables selection and prove that it controls the \emph{directional} false discovery rate (FDR) below a pre-assigned significance level $q\in [0…
42 Citations
Relaxing the assumptions of knockoffs by conditioning
- Mathematics
- 2019
The recent paper Cand\`es et al. (2018) introduced model-X knockoffs, a method for variable selection that provably and non-asymptotically controls the false discovery rate with no restrictions or…
Directional FDR Control for Sub-Gaussian Sparse GLMs
- Computer Science, Mathematics
- 2021
It is shown that the proposed debiased statistics can asymptotically control the directional (sign) FDR and directional false discovery variables at a pre-specified significance level for two-sample problems.
A Scale-free Approach for False Discovery Rate Control in Generalized Linear Models
- MathematicsJournal of the American Statistical Association
- 2023
The generalized linear models (GLM) have been widely used in practice to model non-Gaussian response variables. When the number of explanatory features is relatively large, scientific researchers are…
Online Debiasing for Adaptively Collected High-dimensional Data
- Computer ScienceJournal of the American Statistical Association
- 2021
It is demonstrated that online debiasing optimally debiases the LASSO estimate when the underlying parameter $\theta_0$ has sparsity of order $o(\sqrt{n}/\log p)$.
False Discovery Rate Control Under General Dependence By Symmetrized Data Aggregation
- Computer ScienceJournal of the American Statistical Association
- 2021
The proposed SDA filter first constructs a sequence of ranking statistics that fulfill global symmetry properties, and then chooses a data--driven threshold along the ranking to control the FDR, and establishes the asymptotic validity of SDA for both the FDR and false discovery proportion (FDP) control under mild regularity conditions.
Stepdown SLOPE for Controlled Feature Selection
- Computer Science
- 2023
Two new SLOPEs are proposed to realize control of high-dimensional feature selection by adaptively imposing the non-increasing sequence of tuning parameters on the sorted L-One Penalized Estimation by considering the stepdown-based SLOPE to control the probability of false rejections and false discovery proportion.
Inference in Sparsity-Induced Weak Factor Models
- EconomicsJournal of Business & Economic Statistics
- 2021
Abstract In this article, we consider statistical inference for high-dimensional approximate factor models. We posit a weak factor structure, in which the factor loading matrix can be sparse and the…
Global and Simultaneous Hypothesis Testing for High-Dimensional Logistic Regression Models
- MathematicsJournal of the American Statistical Association
- 2021
Global testing and large-scale multiple testing for the regression coefficients are considered in both single- and two-regression settings and a lower bound for the global testing is established, which shows that the proposed test is asymptotically minimax optimal over some sparsity range.
Inference in Weak Factor Models
- Economics
- 2020
In this paper, we consider statistical inference for high-dimensional approximate factor models. We posit a weak factor structure, in which the factor loading matrix can be sparse and the signal…
Power of FDR Control Methods: The Impact of Ranking Algorithm, Tampered Design, and Symmetric Statistic
- Computer Science
- 2020
The Rare/Weak signal model is adopted, popular in multiple testing and variable selection literature, and the rate of convergence of the number of false positives and the numberof false negatives of FDR control methods for particular classes of designs is characterized.
References
SHOWING 1-10 OF 63 REFERENCES
Debiasing the lasso: Optimal sample size for Gaussian designs
- Computer Science, MathematicsThe Annals of Statistics
- 2018
It is proved that the debiased estimator is asymptotically Gaussian under the nearly optimal condition $s_0 = o(n/ (\log p)^2)$, and a new estimator that is minimax optimal up to a factor $1+o_n(1)$ for i.i.d. Gaussian designs.
Phase Transition and Regularized Bootstrap in Large Scale $t$-tests with False Discovery Rate Control
- Mathematics
- 2013
Applying Benjamini and Hochberg (B-H) method to multiple Student's $t$ tests is a popular technique in gene selection in microarray data analysis. Because of the non-normality of the population, the…
Robust inference with knockoffs
- Computer Science, MathematicsThe Annals of Statistics
- 2020
The results, which are free of any modeling assumption whatsoever, show that the resulting model selection procedure incurs an inflation of the false discovery rate that is proportional to the authors' errors in estimating the distribution of each feature $X_j$ conditional on the remaining features $\{X_k:k\neq j\}$.
Controlling the false discovery rate via knockoffs
- Computer Science
- 2015
The knockoff filter is introduced, a new variable selection procedure controlling the FDR in the statistical linear model whenever there are at least as many observations as variables, and empirical results show that the resulting method has far more power than existing selection rules when the proportion of null variables is high.
Control of the False Discovery Rate Under Arbitrary Covariance Dependence
- Computer Science
- 2010
This paper derives the theoretical distribution for false discovery proportion (FDP) in large scale multiple testing when a common threshold is used and provides a consistent FDP, and proposes a new methodology based on principal factor approximation, which successfully substracts the common dependence and weakens signicantly the correlation structure.
Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition
- Mathematics, Computer ScienceNIPS
- 2013
This paper studies the `Gauss-Lasso' selector, a simple two-stage method that first solves the Lasso, and then performs ordinary least squares restricted to theLasso active set, and forms the `generalized ir Representability condition' (GIC), an assumption that is substantially weaker than irrepresentability.
Nearly optimal sample size in hypothesis testing for high-dimensional regression
- Computer Science, Mathematics2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton)
- 2013
This work proposes a special debiasing method that is well suited for random designs with sparse inverse covariance and yields nearly optimal average testing power if sample size n asymptotically dominates s0(logp)2, with s0 being the sparsity level (number of non-zero coefficients).
A knockoff filter for high-dimensional selective inference
- Computer Science, MathematicsThe Annals of Statistics
- 2019
It is proved that the high-dimensional knockoff procedure 'discovers' important variables as well as the directions (signs) of their effects, in such a way that the expected proportion of wrongly chosen signs is below the user-specified level.
RANK: Large-Scale Inference With Graphical Nonlinear Knockoffs
- Mathematics, Computer ScienceJournal of the American Statistical Association
- 2020
It is established that under mild regularity conditions, the power of the oracle knockoffs procedure with known covariate distribution in high-dimensional linear models is asymptotically one as sample size goes to infinity.
On Model Selection Consistency of Lasso
- Computer ScienceJ. Mach. Learn. Res.
- 2006
It is proved that a single condition, which is called the Irrepresentable Condition, is almost necessary and sufficient for Lasso to select the true model both in the classical fixed p setting and in the large p setting as the sample size n gets large.