Learn More
Histone modifications are important markers of function and chromatin state, yet the DNA sequence elements that direct them to specific genomic locations are poorly understood. Here, we identify hundreds of quantitative trait loci, genome-wide, that affect histone modification or RNA polymerase II (Pol II) occupancy in Yoruba lymphoblastoid cell lines(More)
Tools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges. Here, we develop efficient algorithms for approximate inference of the model underlying the STRUCTURE program using a(More)
Noncoding variants play a central role in the genetics of complex traits, but we still lack a full understanding of the molecular pathways through which they act. We quantified the contribution of cis-acting genetic effects at all major stages of gene regulation from chromatin to proteins, in Yoruba lymphoblastoid cell lines (LCLs). About ~65% of expression(More)
Min-cut clustering, based on minimizing one of two heuristic cost functions proposed by Shi and Malik nearly a decade ago, has spawned tremendous research, both analytic and algorithmic, in the graph partitioning and image segmentation communities over the last decade. It is, however, unclear if these heuristics can be derived from a more general principle,(More)
Introduction The impact of environment on human health is dramatic, with major risk factors including substance use 1 , diet 2 and exercise 3. However, identifying interactions between the environment and an individual's genetic background (GxE) has been hampered by statistical and computational challenges 4,5. By combining RNA sequencing of whole blood and(More)
Accurate annotation of protein coding regions is essential for understanding how genetic information is translated into function. We describe riboHMM, a new method that uses ribosome footprint data to accurately infer translated sequences. Applying riboHMM to human lymphoblastoid cell lines, we identified 7273 novel coding sequences, including 2442(More)
Identifying emerging viral pathogens and characterizing their transmission is essential to developing effective public health measures in response to an epidemic. Phylogenetics, though currently the most popular tool used to characterize the likely host of a virus, can be ambiguous when studying species very distant to known species and when there is very(More)
Motivation: Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE , a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the(More)