A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction

  title={A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction},
  author={Digna R. Velez and Bill C. White and Alison A. Motsinger and William S. Bush and Marylyn DeRiggi Ritchie and Scott M. Williams and Jason H. Moore},
  journal={Genetic Epidemiology},
Multifactor dimensionality reduction (MDR) was developed as a method for detecting statistical patterns of epistasis. The overall goal of MDR is to change the representation space of the data to make interactions easier to detect. It is well known that machine learning methods may not provide robust models when the class variable (e.g. case‐control status) is imbalanced and accuracy is used as the fitness measure. This is because most methods learn patterns that are relevant for the larger of… 
A computationally efficient hypothesis testing method for epistasis analysis using multifactor dimensionality reduction
The goal of the present study was to develop and evaluate several alternatives to large‐scale permutation testing for assessing the statistical significance of MDR models and find that this new hypothesis testing method provides a reasonable alternative to the computationally expensive 1,000‐fold permutation test and is 50 times faster.
Cell-Based Metrics Improve the Detection of Gene-Gene Interactions Using Multifactor Dimensionality Reduction
Two new metrics for MDR to use in evaluating models, Variance and Fisher, are proposed and compared to two previously-used MDR metrics, Balanced Accuracy and Normalized Mutual Information, finding that the proposed metrics consistently outperform the existing metrics across a variety of scenarios.
DualWMDR: Detecting epistatic interaction with dual screening and multifactor dimensionality reduction
DualWMDR is proposed, a new solution that integrates a dual screening strategy with MDR that outperforms existing competitive methods and uses the weighted classification evaluation to improve its performance in epistasis identification on the candidate set.
MB-MDR : Model-Based Multifactor Dimensionality Reduction for detecting interactions in high-dimensional genomic data
The limitations of MDR strategy and nonparametric approaches are illustrated and the value of using a model-based approach for analyzing interactions in case-control studies where adjustment for confounding variables and for main effects is required is demonstrated.
Rule-based analysis for detecting epistasis using associative classification mining
The goal of this research is to implement associative classification and study its effectiveness for detecting the epistasis in balanced and imbalanced datasets and demonstrate significant improvements for detecting interactions associated with the phenotype.
A comparison of internal validation techniques for multifactor dimensionality reduction
Results reveal that the performance of the two internal validation methods is equivalent with the use of pruning procedures, which implies 3WS may be a powerful and computationally efficient approach to screen for epistatic effects, and could be used to identify candidate interactions in large-scale genetic studies.
Weighted Risk Score-Based Multifactor Dimensionality Reduction to Detect Gene-Gene Interactions in Nasopharyngeal Carcinoma
A novel weighted risk score-based multifactor dimensionality reduction (WRSMDR) method that uses the Bayesian posterior probability of polymorphism combinations as a new quantitative measure of disease risk is introduced.
Alternative contingency table measures improve the power and detection of multifactor dimensionality reduction
Of the 10 measures evaluated, the likelihood ratio and normalized mutual information (NMI) are measures that consistently improve the detection and power of MDR in simulated data over using classification error.
Detecting SNP Interactions in Balanced and Imbalanced Datasets using Associative Classification
Though the associative classification showed small improvement in accuracy for balanced datasets, it outperformed existing approaches for higher order multi-locus interactions in imbalanced datasets.
An efficiency analysis of high-order combinations of gene–gene interactions using multifactor-dimensionality reduction
FMDR improves the MDR difficulties associated with the computational loading of high-order SNPs and can be used to evaluate the relative effects of each individual SNP on disease susceptibility.


The effect of reduction in cross‐validation intervals on the performance of multifactor dimensionality reduction
It was found that eliminating CV made final model selection impossible, but that reducing the number of CV intervals from ten to five caused no loss of power, thereby reducing the computation time of the algorithm by half.
Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer.
One of the greatest challenges facing human geneticists is the identification and characterization of susceptibility genes for common complex multifactorial human diseases. This challenge is partly
Ideal discrimination of discrete clinical endpoints using multilocus genotypes
It is concluded that MDR ideally discriminates between low risk and high risk subjects using attributes constructed from multilocus genotype data, similar to that of a naive Bayes classifier.
Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions
A multifactor dimensionality reduction (MDR) method for collapsing high-dimensional genetic data into a single dimension thus permitting interactions to be detected in relatively small sample sizes is developed.
A novel method to identify gene–gene effects in nuclear families: the MDR‐PDT
A novel test, the multifactor dimensionality reduction‐PDT, is developed by merging the MDR method with the genotype‐Pedigree Disequilibrium Test (geno‐ PDT), which allows identification of single‐locus effects or joint effects of multiple loci in families of diverse structure.
Power of multifactor dimensionality reduction for detecting gene‐gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity
Using simulated data, multifactor dimensionality reduction has high power to identify gene‐gene interactions in the presence of 5% genotyping error, 5% missing data, phenocopy, or a combination of both, and MDR has reduced power for some models in the Presence of 50% Phenocopy and very limited power in the absence of genetic heterogeneity.
Computational analysis of gene-gene interactions using multifactor dimensionality reduction
  • J. Moore
  • Biology
    Expert review of molecular diagnostics
  • 2004
A novel strategy known as multifactor dimensionality reduction that was specifically designed for the identification of multilocus genetic effects is presented and several case studies that demonstrate the detection of gene–gene interactions in common diseases such as atrial fibrillation, Type II diabetes and essential hypertension are discussed.
Machine Learning for Detecting Gene-Gene Interactions
This review discusses machine-learning models and algorithms for identifying and characterising susceptibility genes in common, complex, multifactorial human diseases and focuses on the following machine- learning methods that have been used to detect gene-gene interactions: neural networks, cellular automata, random forests, and multifactor dimensionality reduction.
A testing framework for identifying susceptibility genes in the presence of epistasis.
Simulation analysis demonstrated that the FITF approach is more powerful than marginal tests of candidate genes and outperformed multifactor dimensionality reduction when interactions involved additive, dominant, or recessive genes.