Genome-Wide Genetic Analysis Using Genetic Programming: The Critical Need for Expert Knowledge

@inproceedings{Moore2007GenomeWideGA,
  title={Genome-Wide Genetic Analysis Using Genetic Programming: The Critical Need for Expert Knowledge},
  author={Jason H. Moore and Bill C. White},
  year={2007}
}
Human genetics is undergoing an information explosion. The availability of chip-based technology facilitates the measurement of thousands of DNA sequence variation from across the human genome. The challenge is to sift through these high-dimensional datasets to identify combinations of interacting DNA sequence variations that are predictive of common diseases. The goal of this study is to develop and evaluate a genetic programming (GP) approach to attribute selection and classification in this… 

Exploiting Expert Knowledge in Genetic Programming for Genome-Wide Genetic Analysis

TLDR
This study demonstrates that GP may be a useful computational discovery tool in this domain and shows that using expert knowledge to select trees performs as well as a multiobjective fitness function but requires only a tenth of the population size.

An Expert Knowledge-Guided Mutation Operator for Genome-Wide Genetic Analysis Using Genetic Programming

TLDR
This study demonstrates that in the context of an expert knowledge aware GP, mutation may be an appropriate component of the GP used to search for interacting predictors in this domain.

Solving Complex Problems in Human Genetics using Nature-Inspired Algorithms Requires Strategies which Exploit Domain-Specific Knowledge

  • C. GreeneJ. Moore
  • Computer Science
    Nature-Inspired Informatics for Intelligent Applications and Knowledge Discovery
  • 2010
TLDR
The GP and ACO techniques are designed to select relevant attributes, while the CES addresses both the selection of relevant attributes and the modeling of disease risk and the authors examine these methods in the context of epistasis or gene-gene interactions.

GP-Pi: Using Genetic Programming with Penalization and Initialization on Genome-Wide Association Study

TLDR
A penalizing term in the fitness function to penalize trees with common SNPs and an initializer which utilizes expert knowledge to seed the population with good attributes are introduced, which suggested that GP-Pi outperforms GPAS with statistically significance.

Sensible initialization using expert knowledge for genome-wide analysis of epistasis using genetic programming

TLDR
It is shown that the expert-knowledge-aware probabilistic initializer significantly outperforms both the random initializer and the enumerative initializer for this domain.

Sensible Initialization of a Computational Evolution System Using Expert Knowledge for Epistasis Analysis in Human Genetics

TLDR
The results demonstrate that incorporating linkage learning in population initialization via expert knowledge sources improves classification accuracy, enhancing the ability to automate the discovery and characterization of the genetic causes of common human diseases.

Optimal Use of Expert Knowledge in Ant Colony Optimization for the Analysis of Epistasis in Human Disease

TLDR
This work has integrated an ACO stochastic search wrapper into the open source MDR software package, and introduces a scaling method based on an exponential distribution function with a single user-adjustable parameter.

Nature-inspired algorithms for the genetic analysis of epistasis in common human diseases: Theoretical assessment of wrapper vs. filter approaches

TLDR
It is discovered that for this problem, expert knowledge is critical if the authors are to discover nonlinear gene-gene interactions, and under certain assumptions, the filter strategy leads to the highest power.

Exploiting Expert Knowledge of Protein-Protein Interactions in a Computational Evolution System for Detecting Epistasis

TLDR
The ability to incorporate biological knowledge into learning algorithms is an essential step toward the routine use of methods such as CES for identifying genetic risk factors for common human diseases.

Environmental Sensing of Expert Knowledge in a Computational Evolution System for Complex Problem Solving in Human Genetics

TLDR
This study shows that the computational evolution system (CES) developed here is capable of evolving operators which exploit one of several sources of expert knowledge to solve the problem, important for both the discovery of highly fit genetic models and because the particular source of expertknowledge used by evolved operators may provide additional information about the problem itself.
...

References

SHOWING 1-10 OF 70 REFERENCES

STUDENTJAMA. The challenges of whole-genome approaches to common diseases.

TLDR
Powerful statistical and computational methods will need to be developed to model the relationship between combinations of SNPs and disease susceptibility, which suggests several challenges in identifying susceptibility genes from the entire human genome.

A statistical comparison of grammatical evolution strategies in the domain of human genetics

TLDR
This work simulated datasets with up to 6000 attributes using two different genetic models and statistically compared the performance of grammatical evolution, grammatical swarm, and random search for building symbolic discriminant functions and found no statistical difference among search algorithms.

A haplotype map of the human genome

TLDR
A public database of common variation in the human genome: more than one million single nucleotide polymorphisms for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted.

A haplotype map of the human genome.

TLDR
A public database of common variation in the human genome: more than one million single nucleotide polymorphisms for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted.

Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions

TLDR
A multifactor dimensionality reduction (MDR) method for collapsing high-dimensional genetic data into a single dimension thus permitting interactions to be detected in relatively small sample sizes is developed.

The Ubiquitous Nature of Epistasis in Determining Susceptibility to Common Human Diseases

TLDR
A working hypothesis is formed that epistasis is a ubiquitous component of the genetic architecture of common human diseases and that complex interactions are more important than the independent main effects of any one susceptibility gene.

New strategies for identifying gene-gene interactions in hypertension

TLDR
The general problem of identifying gene-gene interactions is reviewed and several traditional and several newer methods that are being used to assess complex genetic interactions in essential hypertension are described.

Computational analysis of gene-gene interactions using multifactor dimensionality reduction

  • J. Moore
  • Biology
    Expert review of molecular diagnostics
  • 2004
TLDR
A novel strategy known as multifactor dimensionality reduction that was specifically designed for the identification of multilocus genetic effects is presented and several case studies that demonstrate the detection of gene–gene interactions in common diseases such as atrial fibrillation, Type II diabetes and essential hypertension are discussed.
...