A General Framework for Estimation and Inference From Clusters of Features

  title={A General Framework for Estimation and Inference From Clusters of Features},
  author={Stephen Reid and Jonathan E. Taylor and Robert Tibshirani},
  journal={Journal of the American Statistical Association},
  pages={280 - 293}
ABSTRACT Applied statistical problems often come with prespecified groupings to predictors. It is natural to test for the presence of simultaneous group-wide signal for groups in isolation, or for multiple groups together. Current tests for the presence of such signals include the classical F-test or a t-test on unsupervised group prototypes (either group centroids or first principal components). In this article, we propose test statistics that aim for power improvements over these classical… Expand
Post-selection estimation and testing following aggregate association tests
This work develops a general approach for valid inference following selection by aggregate testing and provides efficient algorithms for estimation of the post-selection maximum-likelihood estimates and suggests confidence intervals which rely on a novel switching regime for good coverage guarantees. Expand
kernelPSI: a Post-Selection Inference Framework for Nonlinear Variable Selection
This work exploits recent advances in post-selection inference to propose a valid statistical test for the association of a joint model of the selected kernels with the outcome. Expand
High Dimensional Estimation and Multi-Factor Models
The purpose of this paper is to test a multi-factor model for realized returns implied by the generalized arbitrage pricing theory (APT) recently developed by Jarrow and Protter (2016) and JarrowExpand
Detection of epistasis in genome wide association studies with machine learning methods for therapeutic target identification
The developed tools are the first to extend powerful statistical learning frameworks such as causal inference and nonlinear post-selection inference to GWAS and a special emphasis was placed on biological interpretation to validate the findings in multiple sclerosis and body-mass index variations. Expand
Machine learning tools for biomarker discovery
The goal is to propose computational tools that can exploit data sets to extract biological hypotheses that explain, at a genomic or molecular level, the differences between samples that can be observed at a macroscopic scale. Expand
High-Dimensional Estimation, Basis Assets, and the Adaptive Multi-Factor Model
The paper proposes a new algorithm for the high-dimensional financial data -- the Groupwise Interpretable Basis Selection (GIBS) algorithm, to estimate a new Adaptive Multi-Factor (AMF) asset pricingExpand
Geographically and Temporally Weighted Likelihood Regression: Exploring the Spatiotemporal Determinants of Land Use Change
Urban areas possess complex spatial configurations. These patterns are produced by cumulative changes in land use and land cover as human and natural environments are influenced by markets forces,Expand
Geographically weighted regression with a non-Euclidean distance metric: a case study using hedonic house price data
The results indicate that GWR calibrated with a non-Euclidean metric can not only improve model fit, but also provide additional and useful insights into the nature of varying relationships within the house price data set. Expand
The Economy as a Complex Spatial System
This collected volume gives a concise account of the most rel-evant scientific results of the COST Action IS1104 "The EU in the new complex geography of economic systems: models, tools and policyExpand
The spatial spillover effects of haze pollution on inbound tourism: evidence from mid-eastern China
Abstract Although previous studies have paid much attention to the impact of haze on inbound tourism, there is little empirical research on intercity differences in the spatial effects of hazeExpand


Sparse regression and marginal testing using cluster prototypes.
A new approach for sparse regression and marginal testing, for data with correlated features, that uses the post-selection inference theory of Taylor and others to compute exact values and confidence intervals that properly account for the selection of prototypes. Expand
Correlated variables in regression: Clustering and sparse estimation
We consider estimation in a high-dimensional linear model with strongly correlated variables. We propose to cluster the variables first and do subsequent sparse estimation such as the Lasso forExpand
Model selection and estimation in regression with grouped variables
Summary. We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactorExpand
Exact Post Model Selection Inference for Marginal Screening
A framework for post model selection inference, via marginal screening, in linear regression is developed that characterizes the exact distribution of linear functions of the response $y$, conditional on the model being selected ( ``condition on selection" framework). Expand
The efficacy of this method- the "standardized Group Lasso"- over the usual group lasso on real and simulated data sets is demonstrated and it is shown that it is intimately related to the uniformly most powerful invariant test for inclusion of a group. Expand
Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR.
A new method called the OSCAR (octagonal shrinkage and clustering algorithm for regression) is proposed to simultaneously select variables while grouping them into predictive clusters, in addition to improving prediction accuracy and interpretation. Expand
Sparse regression with exact clustering
This dissertation deals with three closely related topics of the lasso in addition to supplying a comprehensive overview of the rapidly growing literature in this field. The first part aims atExpand
Inference in adaptive regression via the Kac–Rice formula
Abstract : We derive an exact p-value for testing a global null hypothesis in a general adaptive regression setting. Our approach uses the Kac-Rice formula (as described in Adler & Taylor 2007)Expand
Group testing and sparse signal recovery
Examining the relationship between group testing and compressive sensing, along with their applications and connections to sparse function learning finds many of the same techniques that are useful for designing tests are also used to solve algorithmic problems in compressed sensing. Expand
A Sparse-Group Lasso
For high-dimensional supervised learning problems, often using problem-specific assumptions can lead to greater accuracy. For problems with grouped covariates, which are believed to have sparseExpand