Corpus ID: 237563052

Cross-Leverage Scores for Selecting Subsets of Explanatory Variables

  title={Cross-Leverage Scores for Selecting Subsets of Explanatory Variables},
  author={Katharina Parry and Leo N. Geppert and Alexander Munteanu and Katja Ickstadt},
In a standard regression problem, we have a set of explanatory variables whose effect on some response vector is modeled. For wide binary data, such as genetic marker data, we often have two limitations. First, we have more parameters than observations. Second, main effects are not the main focus; instead the primary aim is to uncover interactions between the binary variables that effect the response. Methods such as logic regression are able to find combinations of the explanatory variables… Expand

Figures and Tables from this paper


Logic Regression
Logic regression is an adaptive regression methodology that attempts to construct predictors as Boolean combinations of binary covariates. In many regression problems a model is developed thatExpand
Influence Analysis of Generalized Least Squares Estimators
Abstract Influence analysis and leverage analysis are important and well-established adjuncts to ordinary least squares (OLS) regression, but analogous regression diagnostics are not generallyExpand
Identification of interactions of binary variables associated with survival time using survivalFS
This article presents an ensemble method based on logic regression that can cope with the instability of the regression models generated by logic regression and introduces a new performance measure, which is an adaptation of Harrel’s concordance index. Expand
Identification of SNP interactions using logic regression.
This paper shows how logic regression can be employed to identify SNP interactions explanatory for the disease status in a case-control study and proposes 2 measures for quantifying the importance of these interactions for classification. Expand
Influential Observations, High Leverage Points, and Outliers in Linear Regression
A bewilderingly large number of statistical quantities have been proposed to study outliers and influence of individual observations in regression analysis. In this article we describe theExpand
Based on a multivariate linear regression model, we propose several generalizations to the multivariate classical and modified Cook’s distances in order to detect one or more of influentialExpand
Fast approximation of matrix coherence and statistical leverage
A randomized algorithm is proposed that takes as input an arbitrary n × d matrix A, with n ≫ d, and returns, as output, relative-error approximations to all n of the statistical leverage scores. Expand
Wrappers for Feature Subset Selection
The wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain and compares the wrapper approach to induction without feature subset selection and to Relief, a filter approach tofeature subset selection. Expand
Random projections for Bayesian regression
The main result shows that the posterior distribution of Bayesian linear regression is approximated up to a small error depending on only an $$\varepsilon $$ε-fraction of its defining parameters. Expand
Detection of Influential Observation in Linear Regression
  • R. Cook
  • Computer Science
  • Technometrics
  • 2000
A new measure based on confidence ellipsoids is developed for judging the contribution of each data point to the determination of the least squares estimate of the parameter vector in full rankExpand