NGS technologies for analyzing germplasm diversity in genebanks*
Although genomewide association studies have successfully identified associations of many common single-nucleotide polymorphisms (SNPs) with common diseases, the SNPs implicated so far account for only a small proportion of the genetic variability of tested diseases. It has been suggested that common diseases may often be caused by rare alleles missed by genomewide association studies. To identify these rare alleles we need high-throughput, high-accuracy resequencing technologies. Although array-based genotyping has allowed genomewide association studies of common SNPs in tens of thousands of samples, array-based resequencing has been limited for 2 main reasons: the lack of a fully multiplexed pipeline for high-throughput sample processing, and failure to achieve sufficient performance. We have recently solved both of these problems and created a fully multiplexed high-throughput pipeline that results in high-quality data. The pipeline consists of target amplification from genomic DNA, followed by allele enrichment to generate pools of purified variant (or nonvariant) DNA and ends with interrogation of purified DNA on resequencing arrays. We have used this pipeline to resequence approximately 5 Mb of DNA (on 3 arrays) corresponding to the exons of 1,500 genes in >473 samples; in total >2,350 Mb were sequenced. In the context of this large-scale study we obtained a false positive rate of approximately 1 in 500,000 bp and a false negative rate of approximately 10%.