The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific
Deep exome resequencing is a powerful approach for delineating patterns of protein-coding variation among genes, pathways, individuals and populations. We analyzed exome data from 2,440 individuals of European and African ancestry as part of the National Heart, Lung, and Blood Institute’s Exome Project, the aim of which is to discover novel genes and mechanisms that contribute to heart, lung and blood disorders. Each exome was sequenced to a mean coverage of 116×, allowing detailed inferences about the population genomic patterns of both common variation and rare coding variation. We identifi ed more than 500,000 single nucleotide variations, the majority of which were novel and rare (76% of variants had a minor allele frequency of less than 0.1%), refl ecting the recent dramatic increase in the size of the human population. The unprecedented magnitude of this dataset allowed us to rigorously characterize the large variation in nucleotide diversity among genes (ranging from 0 to 1.32%), as well as the role of positive and purifying selection in shaping patterns of proteincoding variation and the diff erential signatures of population structure from rare and common variation. This dataset provides a framework for personal genomics and is an important resource that will allow inferences of broad importance to human evolution and health.