Matthew D. Edwards

Learn More
Modern genetics has been transformed by high-throughput sequencing. New experimental designs in model organisms involve analyzing many individuals, pooled and sequenced in groups for increased efficiency. However, the uncertainty from pooling and the challenge of noisy sequencing data demand advanced computational methods. We present MULTI POOL, a(More)
Regulatory proteins can bind to different sets of genomic targets in various cell types or conditions. To reliably characterize such condition-specific regulatory binding we introduce MultiGPS, an integrated machine learning approach for the analysis of multiple related ChIP-seq experiments. MultiGPS is based on a generalized Expectation Maximization(More)
The measurement of any nonchromosomal genetic contribution to the heritability of a trait is often confounded by the inability to control both the chromosomal and nonchromosomal information in a population. We have designed a unique system in yeast where we can control both sources of information so that the phenotype of a single chromosomal polymorphism(More)
We show that existing RNA-seq, DNase-seq, and ChIP-seq data exhibit overdispersed per-base read count distributions that are not matched to existing computational method assumptions. To compensate for this overdispersion we introduce a nonparametric and universal method for processing per-base sequencing read count data called FIXSEQ. We demonstrate that(More)
MOTIVATION Convolutional neural networks (CNN) have outperformed conventional methods in modeling the sequence specificity of DNA-protein binding. Yet inappropriate CNN architectures can yield poorer performance than simpler models. Thus an in-depth understanding of how to match CNN architecture to a given task is needed to fully harness the power of CNNs(More)
In many human diseases, associated genetic changes tend to occur within noncoding regions, whose effect might be related to transcriptional control. A central goal in human genetics is to understand the function of such noncoding regions: given a region that is statistically associated with changes in gene expression (expression quantitative trait locus(More)
We present a novel ensemble-based computational framework, EnsembleExpr, that achieved the best performance in the Fourth Critical Assessment of Genome Interpretation (CAGI4) "eQTL-causal SNPs" challenge for identifying eQTLs and prioritizing their gene expression effects. Expression quantitative trait loci (eQTLs) are genome sequence variants that result(More)
With the advent of high-throughput sequencing technology, Genome Wide Association Studies (GWAS) have identified thousands of genetic variants that are associated with disease and complex traits. Many of these variants reside in the non-coding region of the genome, and affect gene expression and downstream cellular phenotype by disrupting the regulatory(More)
Advances in high-throughput DNA sequencing have created new avenues of attack for classical genetics problems. This thesis develops and applies principled methods for analyzing DNA sequencing data from multiple pools of individual genomes. Theoretical expectations under several genetic models are used to inform specific experimental designs and guide the(More)
  • 1