Pathway-Based Kernel Boosting for the Analysis of Genome-Wide Association Studies

  title={Pathway-Based Kernel Boosting for the Analysis of Genome-Wide Association Studies},
  author={Stefanie Friedrichs and Juliane Manitz and Patricia Burger and Christopher I. Amos and Angela Risch and Jenny Chang-Claude and Heinz-Erich Wichmann and Thomas Kneib and Heike Bickeb{\"o}ller and Benjamin Hofner},
  journal={Computational and Mathematical Methods in Medicine},
The analysis of genome-wide association studies (GWAS) benefits from the investigation of biologically meaningful gene sets, such as gene-interaction networks (pathways). We propose an extension to a successful kernel-based pathway analysis approach by integrating kernel functions into a powerful algorithmic framework for variable selection, to enable investigation of multiple pathways simultaneously. We employ genetic similarity kernels from the logistic kernel machine test (LKMT) as base… 

Figures and Tables from this paper

A Pathway-Based Kernel Boosting Method for Sample Classification Using Genomic Data

This article proposes a Pathway-based Kernel Boosting (PKB) method for integrating gene pathway information for sample classification, where kernel functions calculated from each pathway are used as base learners and learn the weights through iterative optimization of the classification loss function.

Kernel-Based Pathway Approaches for Testing and Selection

The development of a new method in the evaluation of SNP sets, focussing on the analysis of those representing pathways, has great potential to elucidate key biological functions involved in disease risk, while creating a directly interpretable model to predict disease status.

A general kernel boosting framework integrating pathways for predictive modeling based on genomic data

Through extensive simulations and case studies, it is demonstrated that PKB can substantially outperform other competing methods, better identify biological pathways related to drug response and patient survival, and provide novel insights into cancer pathogenesis and treatment response.

Improving stability of prediction models based on correlated omics data by using network approaches

A novel strategy for model selection is proposed where the obtained models also perform well in terms of overall predictability and recommendations are provided for selecting a strategy for building a prediction model given the specific goal of the analysis and the sizes of the datasets.

Review of Genetic Variation as a Predictive Biomarker for Chronic Graft-Versus-Host-Disease After Allogeneic Stem Cell Transplantation

It is concluded that future studies should focus on modern genome-level tools, such as machine learning, polygenic risk scores and genome-wide association study-transcription meta-analyses, instead of focusing on just single variants.

Manifold regularization based on Nystr{ö}m type subsampling

Análisis de rutas biológicas y estudio de asociación amplia del genoma identificaron factores genéticos del cromosoma 10 asociados con la variación de antocianinas en papa

Esta investigacion plantea the integracion of the aproximacion of rutas biologicas “biological pathway” en un estudio de asociacion genetica de compuestos de antocianinas en papa.

An Update on Statistical Boosting in Biomedicine

This review article highlights the most recent methodological developments on statistical boosting regarding variable selection, functional regression, and advanced time-to-event modelling.



A Network-Based Kernel Machine Test for the Identification of Risk Pathways in Genome-Wide Association Studies

This study proposes a novel kernel that incorporates the topology of pathways and information on interactions and applies it to genome-wide association case-control data on lung cancer and rheumatoid arthritis to identify some promising new pathways associated with these diseases.

A Novel Kernel for Correcting Size Bias in the Logistic Kernel Machine Test with an Application to Rheumatoid Arthritis

A novel kernel that includes an appropriate standardization in order to protect against any inflation of false positive results is proposed and it is found that even this basic genomic structure can improve the ability of the LKMT to identify meaningful associations.

Gene network-based cancer prognosis analysis with sparse boosting.

Simulation study shows that NSBoost can more accurately identify cancer-associated genes and modules than alternatives and outperforms alternatives including the boosting and penalization approaches by identifying a smaller number of genes/ modules and/or having better prediction performance.

Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models

It is shown that kernel machine estimation of the model components can be formulated using a logistic mixed model, and hence can proceed within a mixed model framework using standard statistical software.

Network-based model weighting to detect multiple loci influencing complex diseases

  • W. Pan
  • Biology
    Human Genetics
  • 2008
A gene network-based method to improve statistical power over that of the exhaustive search for DNA variants associated with disease susceptibility by giving higher weights to models involving genes nearby in a network is proposed.

Network models of genome-wide association studies uncover the topological centrality of protein interactions in complex diseases

Virtually all common diseases are complex human traits, and thus the topological centrality in protein networks of complex trait genes has implications in genetics, personal genomics, and therapy.

Testing the additional predictive value of high-dimensional molecular data

A simple and computationally efficient approach can be used to globally assess the additional predictive power of a large number of candidate predictors given that a few clinical covariates or a known prognostic index are already available.

Boosting in structured additive models

It is shown that variable selection may be biased if the base-learners have different degrees of flexibility, both for categorical covariates and for smooth effects of continuous covariates, and a framework for unbiased model selection based on a general class of penalized least squares base- learners is suggested.

Comparing strategies for combined testing of rare and common variants in whole sequence and genome-wide genotype data

This work used the extension of the kernel score test to family data to analyze real and simulated baseline systolic blood pressure in extended pedigrees and found defining the gene region based on linkage disequilibrium blocks often yielded robust power of joint tests of rare and common markers.