The effect of reduction in cross‐validation intervals on the performance of multifactor dimensionality reduction

@article{Motsinger2006TheEO,
  title={The effect of reduction in cross‐validation intervals on the performance of multifactor dimensionality reduction},
  author={Alison A. Motsinger and Marylyn DeRiggi Ritchie},
  journal={Genetic Epidemiology},
  year={2006},
  volume={30}
}
Multifactor Dimensionality Reduction (MDR) was developed to detect genetic polymorphisms that present an increased risk of disease. Cross‐validation (CV) is an important part of the MDR algorithm, as it prevents over‐fitting and allows the predictive ability of a model to be evaluated. CV is a computationally intensive step in the MDR algorithm. Traditionally, MDR has been implemented using 10‐fold CV. In order to reduce computation time and therefore allow MDR analysis to be applied to larger… 
A comparison of internal validation techniques for multifactor dimensionality reduction
TLDR
Results reveal that the performance of the two internal validation methods is equivalent with the use of pruning procedures, which implies 3WS may be a powerful and computationally efficient approach to screen for epistatic effects, and could be used to identify candidate interactions in large-scale genetic studies.
The Effect of Retrospective Sampling on Estimates of Prediction Error for Multifactor Dimensionality Reduction
TLDR
It is argued that a prospective error estimate is necessary if MDR models are used for prediction, and a bootstrap resampling estimate, integrating population prevalence, is proposed to accurately estimate prospective error.
A comparison of internal model validation methods for multifactor dimensionality reduction in the case of genetic heterogeneity
TLDR
Results show that the cross-validation approach greatly outperformed the three-way split approach in detecting heterogeneity, and emphasize the challenge of detecting heterogeneity models and the need for further methods development.
A Comparison of Multifactor Dimensionality Reduction and L1-Penalized Regression to Identify Gene-Gene Interactions in Genetic Association Studies
TLDR
This study compares the performance of MDR, the traditional lasso with L1 penalty (TL1), and the group lasso for categorical data with group-wise L 1 penalty (GL1) to detect gene-gene interactions through a broad range of simulations to provide guidance of when each approach might be best suited for detecting and characterizing interactions with different mechanisms.
Class Balanced Multifactor Dimensionality Reduction to Detect Gene–Gene Interactions
TLDR
This study used several epistatic models with and without marginal effects under different parameter settings (heritability and minor allele frequencies) to evaluate the performance of existing approaches and found that BMDR could effectively detect significant gene–gene interactions.
Practical and Theoretical Considerations in Study Design for Detecting Gene-Gene Interactions Using MDR and GMDR Approaches
TLDR
With adjustment of a covariate, GMDR performs better than MDR and a sample size of 1000∼2000 is reasonably large for detecting gene-gene interactions in the range of effect size reported by the current literature; whereas larger sample size is required for more subtle interactions with accuracy<0.56.
Multiobjective multifactor dimensionality reduction to detect SNP‐SNP interactions
TLDR
A multiobjective MDR (MOMDR) method that is based on a contingency table of MDR as an objective function that considers the incorporated measures, including correct classification and likelihood rates, to detect SSIs and adopts set theory to predict the most favorable SSIs with cross‐validation consistency is proposed.
A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction
TLDR
The results suggest that balanced accuracy should be used instead of accuracy for the MDR analysis of epistasis in imbalanced datasets.
Exploring the Performance of Multifactor Dimensionality Reduction in Large Scale SNP Studies and in the Presence of Genetic Heterogeneity among Epistatic Disease Models
TLDR
These results show MDR is robust to locus heterogeneity when the definition of power is not as conservative as in previous simulation studies, and indicate that MDR performance is related more strongly to broad-sense heritability than sample size and is not greatly affected by non-model loci.
A comparison of analytical methods for genetic association studies
TLDR
This study compares the performance of six analytical approaches to detect both main effects and gene‐gene interactions in a range of genetic models and demonstrates the strengths and weaknesses of each and illustrates the importance of continued methods development.
...
...

References

SHOWING 1-10 OF 18 REFERENCES
Power of multifactor dimensionality reduction for detecting gene‐gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity
TLDR
Using simulated data, multifactor dimensionality reduction has high power to identify gene‐gene interactions in the presence of 5% genotyping error, 5% missing data, phenocopy, or a combination of both, and MDR has reduced power for some models in the Presence of 50% Phenocopy and very limited power in the absence of genetic heterogeneity.
Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer.
One of the greatest challenges facing human geneticists is the identification and characterization of susceptibility genes for common complex multifactorial human diseases. This challenge is partly
Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions
TLDR
A multifactor dimensionality reduction (MDR) method for collapsing high-dimensional genetic data into a single dimension thus permitting interactions to be detected in relatively small sample sizes is developed.
An application of conditional logistic regression and multifactor dimensionality reduction for detecting gene-gene Interactions on risk of myocardial infarction: The importance of model validation
TLDR
The significant interaction initially observed does not validate and may represent a type I error, and it will become increasingly important to stress model validation in order to ensure that significant effects represent true relationships rather than chance findings.
A novel method to identify gene–gene effects in nuclear families: the MDR‐PDT
TLDR
A novel test, the multifactor dimensionality reduction‐PDT, is developed by merging the MDR method with the genotype‐Pedigree Disequilibrium Test (geno‐ PDT), which allows identification of single‐locus effects or joint effects of multiple loci in families of diverse structure.
Research issues and strategies for genomic and proteomic biomarker discovery and validation: a statistical perspective.
TLDR
The development and validation of clinically useful biomarkers from high-dimensional genomic and proteomic information pose great research challenges, and logistic regression has features of robustness against model misspecification, and has resistance to model overfitting.
New strategies for identifying gene-gene interactions in hypertension
TLDR
The general problem of identifying gene-gene interactions is reviewed and several traditional and several newer methods that are being used to assess complex genetic interactions in essential hypertension are described.
A perspective on epistasis: limits of models displaying no main effect.
TLDR
This article examines a large class of genetic models, delimiting the range of genetic determination and recurrence risks for two-, three-, and four-locus purely epistatic models, and reveals that these models, although giving rise to no additive or dominance variation, give rise to increased allele sharing between affected sibs.
...
...