# A General Framework for Estimation and Inference From Clusters of Features

@article{Reid2015AGF, title={A General Framework for Estimation and Inference From Clusters of Features}, author={Stephen Reid and Jonathan E. Taylor and Robert Tibshirani}, journal={Journal of the American Statistical Association}, year={2015}, volume={113}, pages={280 - 293} }

ABSTRACT Applied statistical problems often come with prespecified groupings to predictors. It is natural to test for the presence of simultaneous group-wide signal for groups in isolation, or for multiple groups together. Current tests for the presence of such signals include the classical F-test or a t-test on unsupervised group prototypes (either group centroids or first principal components). In this article, we propose test statistics that aim for power improvements over these classical… Expand

#### Topics from this paper

#### 17 Citations

Post-selection estimation and testing following aggregate association tests

- Computer Science, Mathematics
- Journal of the Royal Statistical Society: Series B (Statistical Methodology)
- 2019

This work develops a general approach for valid inference following selection by aggregate testing and provides efficient algorithms for estimation of the post-selection maximum-likelihood estimates and suggests confidence intervals which rely on a novel switching regime for good coverage guarantees. Expand

kernelPSI: a Post-Selection Inference Framework for Nonlinear Variable Selection

- Computer Science
- ICML
- 2019

This work exploits recent advances in post-selection inference to propose a valid statistical test for the association of a joint model of the selected kernels with the outcome. Expand

High Dimensional Estimation and Multi-Factor Models

- Economics, Mathematics
- 2018

The purpose of this paper is to test a multi-factor model for realized returns implied by the generalized arbitrage pricing theory (APT) recently developed by Jarrow and Protter (2016) and Jarrow… Expand

Detection of epistasis in genome wide association studies with machine learning methods for therapeutic target identification

- Computer Science
- 2020

The developed tools are the first to extend powerful statistical learning frameworks such as causal inference and nonlinear post-selection inference to GWAS and a special emphasis was placed on biological interpretation to validate the findings in multiple sclerosis and body-mass index variations. Expand

Machine learning tools for biomarker discovery

- Computer Science
- 2020

The goal is to propose computational tools that can exploit data sets to extract biological hypotheses that explain, at a genomic or molecular level, the differences between samples that can be observed at a macroscopic scale. Expand

High-Dimensional Estimation, Basis Assets, and the Adaptive Multi-Factor Model

- Computer Science
- 2018

The paper proposes a new algorithm for the high-dimensional financial data -- the Groupwise Interpretable Basis Selection (GIBS) algorithm, to estimate a new Adaptive Multi-Factor (AMF) asset pricing… Expand

Geographically and Temporally Weighted Likelihood Regression: Exploring the Spatiotemporal Determinants of Land Use Change

- 2013

Urban areas possess complex spatial configurations. These patterns are produced by cumulative changes in land use and land cover as human and natural environments are influenced by markets forces,… Expand

Geographically weighted regression with a non-Euclidean distance metric: a case study using hedonic house price data

- Geography, Computer Science
- Int. J. Geogr. Inf. Sci.
- 2014

The results indicate that GWR calibrated with a non-Euclidean metric can not only improve model fit, but also provide additional and useful insights into the nature of varying relationships within the house price data set. Expand

The Economy as a Complex Spatial System

- 2018

This collected volume gives a concise account of the most rel-evant scientific results of the COST Action IS1104 "The EU in the new complex geography of economic systems: models, tools and policy… Expand

The spatial spillover effects of haze pollution on inbound tourism: evidence from mid-eastern China

- Geography
- Tourism Geographies
- 2019

Abstract Although previous studies have paid much attention to the impact of haze on inbound tourism, there is little empirical research on intercity differences in the spatial effects of haze… Expand

#### References

SHOWING 1-10 OF 50 REFERENCES

Sparse regression and marginal testing using cluster prototypes.

- Computer Science, Mathematics
- Biostatistics
- 2016

A new approach for sparse regression and marginal testing, for data with correlated features, that uses the post-selection inference theory of Taylor and others to compute exact values and confidence intervals that properly account for the selection of prototypes. Expand

Correlated variables in regression: Clustering and sparse estimation

- Mathematics
- 2013

We consider estimation in a high-dimensional linear model with strongly correlated variables. We propose to cluster the variables first and do subsequent sparse estimation such as the Lasso for… Expand

Model selection and estimation in regression with grouped variables

- Mathematics
- 2006

Summary. We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor… Expand

Exact Post Model Selection Inference for Marginal Screening

- Computer Science, Mathematics
- NIPS
- 2014

A framework for post model selection inference, via marginal screening, in linear regression is developed that characterizes the exact distribution of linear functions of the response $y$, conditional on the model being selected ( ``condition on selection" framework). Expand

STANDARDIZATION AND THE GROUP LASSO PENALTY.

- Computer Science, Medicine
- Statistica Sinica
- 2012

The efficacy of this method- the "standardized Group Lasso"- over the usual group lasso on real and simulated data sets is demonstrated and it is shown that it is intimately related to the uniformly most powerful invariant test for inclusion of a group. Expand

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR.

- Mathematics, Medicine
- Biometrics
- 2008

A new method called the OSCAR (octagonal shrinkage and clustering algorithm for regression) is proposed to simultaneously select variables while grouping them into predictive clusters, in addition to improving prediction accuracy and interpretation. Expand

Sparse regression with exact clustering

- Mathematics
- 2008

This dissertation deals with three closely related topics of the lasso in addition to supplying a comprehensive overview of the rapidly growing literature in this field.
The first part aims at… Expand

Inference in adaptive regression via the Kac–Rice formula

- Mathematics
- 2016

Abstract : We derive an exact p-value for testing a global null hypothesis in a general adaptive regression setting. Our approach uses the Kac-Rice formula (as described in Adler & Taylor 2007)… Expand

Group testing and sparse signal recovery

- Computer Science
- 2008 42nd Asilomar Conference on Signals, Systems and Computers
- 2008

Examining the relationship between group testing and compressive sensing, along with their applications and connections to sparse function learning finds many of the same techniques that are useful for designing tests are also used to solve algorithmic problems in compressed sensing. Expand

A Sparse-Group Lasso

- Mathematics
- 2013

For high-dimensional supervised learning problems, often using problem-specific assumptions can lead to greater accuracy. For problems with grouped covariates, which are believed to have sparse… Expand