Genome-Wide Genetic Analysis Using Genetic Programming: The Critical Need for Expert Knowledge

  title={Genome-Wide Genetic Analysis Using Genetic Programming: The Critical Need for Expert Knowledge},
  author={Jason H. Moore and Bill C. White},
Human genetics is undergoing an information explosion. The availability of chip-based technology facilitates the measurement of thousands of DNA sequence variation from across the human genome. The challenge is to sift through these high-dimensional datasets to identify combinations of interacting DNA sequence variations that are predictive of common diseases. The goal of this study is to develop and evaluate a genetic programming (GP) approach to attribute selection and classification in this… 
Exploiting Expert Knowledge in Genetic Programming for Genome-Wide Genetic Analysis
This study demonstrates that GP may be a useful computational discovery tool in this domain and shows that using expert knowledge to select trees performs as well as a multiobjective fitness function but requires only a tenth of the population size.
An Expert Knowledge-Guided Mutation Operator for Genome-Wide Genetic Analysis Using Genetic Programming
This study demonstrates that in the context of an expert knowledge aware GP, mutation may be an appropriate component of the GP used to search for interacting predictors in this domain.
Solving Complex Problems in Human Genetics using Nature-Inspired Algorithms Requires Strategies which Exploit Domain-Specific Knowledge
  • C. GreeneJ. Moore
  • Computer Science
    Nature-Inspired Informatics for Intelligent Applications and Knowledge Discovery
  • 2010
The GP and ACO techniques are designed to select relevant attributes, while the CES addresses both the selection of relevant attributes and the modeling of disease risk and the authors examine these methods in the context of epistasis or gene-gene interactions.
Ant Colony Optimization for Genome-Wide Genetic Analysis
An ACO approach is not successful in the absence of expert knowledge but is successful when expert knowledge is supplied through the pheromone updating rule, and a prototype of an expert knowledge guided probabilistic search wrapper is developed.
Sensible Initialization of a Computational Evolution System Using Expert Knowledge for Epistasis Analysis in Human Genetics
The results demonstrate that incorporating linkage learning in population initialization via expert knowledge sources improves classification accuracy, enhancing the ability to automate the discovery and characterization of the genetic causes of common human diseases.
Optimal Use of Expert Knowledge in Ant Colony Optimization for the Analysis of Epistasis in Human Disease
This work has integrated an ACO stochastic search wrapper into the open source MDR software package, and introduces a scaling method based on an exponential distribution function with a single user-adjustable parameter.
Nature-inspired algorithms for the genetic analysis of epistasis in common human diseases: Theoretical assessment of wrapper vs. filter approaches
It is discovered that for this problem, expert knowledge is critical if the authors are to discover nonlinear gene-gene interactions, and under certain assumptions, the filter strategy leads to the highest power.
Exploiting Expert Knowledge of Protein-Protein Interactions in a Computational Evolution System for Detecting Epistasis
The ability to incorporate biological knowledge into learning algorithms is an essential step toward the routine use of methods such as CES for identifying genetic risk factors for common human diseases.
Environmental Sensing of Expert Knowledge in a Computational Evolution System for Complex Problem Solving in Human Genetics
This study shows that the computational evolution system (CES) developed here is capable of evolving operators which exploit one of several sources of expert knowledge to solve the problem, important for both the discovery of highly fit genetic models and because the particular source of expertknowledge used by evolved operators may provide additional information about the problem itself.
Genomic mining for complex disease traits with “random chemistry”
A new evolutionary approach that attempts to hill-climb from large sets of candidate epistatic genetic features to smaller sets, inspired by Kauffman’s “random chemistry” approach to detecting small auto-catalytic sets of molecules from within large sets is proposed.


STUDENTJAMA. The challenges of whole-genome approaches to common diseases.
Powerful statistical and computational methods will need to be developed to model the relationship between combinations of SNPs and disease susceptibility, which suggests several challenges in identifying susceptibility genes from the entire human genome.
A haplotype map of the human genome
A public database of common variation in the human genome: more than one million single nucleotide polymorphisms for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted.
Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions
A multifactor dimensionality reduction (MDR) method for collapsing high-dimensional genetic data into a single dimension thus permitting interactions to be detected in relatively small sample sizes is developed.
The Ubiquitous Nature of Epistasis in Determining Susceptibility to Common Human Diseases
A working hypothesis is formed that epistasis is a ubiquitous component of the genetic architecture of common human diseases and that complex interactions are more important than the independent main effects of any one susceptibility gene.
New strategies for identifying gene-gene interactions in hypertension
The general problem of identifying gene-gene interactions is reviewed and several traditional and several newer methods that are being used to assess complex genetic interactions in essential hypertension are described.
Computational analysis of gene-gene interactions using multifactor dimensionality reduction
  • J. Moore
  • Biology
    Expert review of molecular diagnostics
  • 2004
A novel strategy known as multifactor dimensionality reduction that was specifically designed for the identification of multilocus genetic effects is presented and several case studies that demonstrate the detection of gene–gene interactions in common diseases such as atrial fibrillation, Type II diabetes and essential hypertension are discussed.
Genetic programming 2 - automatic discovery of reusable programs
  • J. Koza
  • Computer Science
    Complex adaptive systems
  • 1994
Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer.
One of the greatest challenges facing human geneticists is the identification and characterization of susceptibility genes for common complex multifactorial human diseases. This challenge is partly
Power of multifactor dimensionality reduction for detecting gene‐gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity
Using simulated data, multifactor dimensionality reduction has high power to identify gene‐gene interactions in the presence of 5% genotyping error, 5% missing data, phenocopy, or a combination of both, and MDR has reduced power for some models in the Presence of 50% Phenocopy and very limited power in the absence of genetic heterogeneity.