Determination of nonlinear genetic architecture using compressed sensing

@article{Ho2015DeterminationON,
  title={Determination of nonlinear genetic architecture using compressed sensing},
  author={Chiu Man Ho and Stephen D. H. Hsu},
  journal={GigaScience},
  year={2015},
  volume={4}
}
  • C. Ho, S. Hsu
  • Published 27 August 2014
  • Biology
  • GigaScience
BackgroundOne of the fundamental problems of modern genomics is to extract the genetic architecture of a complex trait from a data set of individual genotypes and trait values. Establishing this important connection between genotype and phenotype is complicated by the large number of candidate genes, the potentially large number of causal loci, and the likely presence of some nonlinear interactions between different genes. Compressed Sensing methods obtain solutions to under-constrained systems… 
A compressed sensing based two-stage method for detecting epistatic interactions
TLDR
Results demonstrate that CSMiner is effective and efficient in detecting epistatic interactions, and might be an alternative to existing methods.
Genetic architecture of complex traits and disease risk predictors
TLDR
It is found that the fraction of SNPs in or near genic regions varies widely by phenotype, and that exome data alone will miss much of the heritability for these traits – i.e., existing PRS cannot be computed from exome-sequencing data alone.
From Genotype to Phenotype: Polygenic Prediction of Complex Human Traits.
TLDR
The present status and future prospects of genomic prediction ofcomplex traits in humans are described and related topics such as the genetic architecture of complex traits, sibling validation of polygenic scores, and applications to adult health, in vitro fertilization (embryo selection), and genetic engineering are discussed.
On the genetic architecture of intelligence and other quantitative traits
TLDR
Some unpublished results concerning the genetic architecture of height and cognitive ability are described, which suggest that roughly 10k moderately rare causal variants of mostly negative effect are responsible for normal population variation.
Genetic risk assessment of the joint effect of several genes: Critical appraisal
TLDR
Various algorithms for the model selection (searching the significant predictor combinations) are considered, beginning from the common marginal screening of the “top” predictors to LASSO and other modern algorithms of compressed sensing.
Genomic Prediction of Complex Disease Risk
TLDR
The results indicate that substantial improvements in predictive power are attainable using training sets with larger case populations, and anticipate rapid improvement in genomic prediction as more case-control data become available for analysis.
A Compressed Sensing Based Feature Extraction Method for Identifying Characteristic Genes
TLDR
A novel compressed sensing (CS) based feature extraction method named CSGS is proposed to identify the characteristic genes and demonstrates that it is effective in identifying characteristic genes, and is not sensitive to parameters.
Within-Family Validation of Polygenic Risk Scores and Complex Trait Prediction
We test a variety of polygenic predictors using tens of thousands of genetic siblings for whom we have SNP genotypes, health status, and phenotype information in late adulthood. Siblings have
Sibling validation of polygenic risk scores and complex trait prediction
TLDR
This work tests 26 polygenic predictors using tens of thousands of genetic siblings from the UK Biobank, for whom SNP genotypes, health status, and phenotype information in late adulthood and finds that typically most of the predictive power persists in between-sibling designs.
Accurate Genomic Prediction Of Human Height
TLDR
The results resolve the common SNP portion of the “missing heritability” problem – i.e., the gap between prediction R-squared and SNP heritability.
...
...

References

SHOWING 1-10 OF 35 REFERENCES
Applying compressed sensing to genome-wide association studies
TLDR
Practical measures of signal recovery are robust to linkage disequilibrium between a true causal variant and markers residing in the same genomic region and this approach to the GWAS analysis of height is applied.
Leveraging input and output structures for joint mapping of epistatic and marginal eQTLs
TLDR
A novel regularization scheme over multitask regression is proposed called jointly structured input–output lasso based on an ℓ1/ℓ2 norm, which allows shared sparsity patterns for related inputs and outputs to be optimally estimated and generalizes to structurally regularized polynomial regression to detect epistatic interactions with manageable complexity.
GCTA: a tool for genome-wide complex trait analysis.
Statistical analysis of genetic interactions.
  • N. Yi
  • Biology
    Genetics research
  • 2010
TLDR
This paper provides an overview of the available statistical methods and related computer software for identifying genetic interactions in animal and plant experimental crosses and human genetic association studies and highlights some areas of future research.
TEAM: efficient two-locus epistasis tests in human genome-wide association study
TLDR
This article proposes an efficient algorithm, TEAM, which significantly speeds up epistasis detection for human GWAS, and has broader applicability and is more efficient than existing methods for large sample study.
Machine Learning for Detecting Gene-Gene Interactions
TLDR
This review discusses machine-learning models and algorithms for identifying and characterising susceptibility genes in common, complex, multifactorial human diseases and focuses on the following machine- learning methods that have been used to detect gene-gene interactions: neural networks, cellular automata, random forests, and multifactor dimensionality reduction.
Data and Theory Point to Mainly Additive Genetic Variance for Complex Traits
TLDR
It is concluded that interactions at thelevel of genes are not likely to generate much interaction at the level of variance, and that additive variance typically accounts for over half, and often close to 100%, of the total genetic variance.
On the genetic architecture of intelligence and other quantitative traits
TLDR
Some unpublished results concerning the genetic architecture of height and cognitive ability are described, which suggest that roughly 10k moderately rare causal variants of mostly negative effect are responsible for normal population variation.
Analysis of multilocus models of association
TLDR
It is proved that statistical inference can be based on controlling the false discovery rate (FDR), which is defined as the expected number of false rejections divided by the number of rejections, and introduced a computationally efficient form of forward stepwise regression against the FDR methods.
BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies
...
...