Learn More
SUMMARY The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes(More)
SUMMARY The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was(More)
By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By(More)
Deviations from Hardy-Weinberg equilibrium (HWE) can indicate inbreeding, population stratification, and even problems in genotyping. In samples of affected individuals, these deviations can also provide evidence for association. Tests of HWE are commonly performed using a simple chi2 goodness-of-fit test. We show that this chi2 test can have inflated type(More)
As a first step toward understanding how rare variants contribute to risk for complex diseases, we sequenced 15,585 human protein-coding genes to an average median depth of 111× in 2440 individuals of European (n = 1351) and African (n = 1088) ancestry. We identified over 500,000 single-nucleotide variants (SNVs), the majority of which were rare (86% with a(More)
The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access, and is the format in which alignments from the 1000 Genomes Project are(More)
With millions of single-nucleotide polymorphisms (SNPs) identified and characterized, genomewide association studies have begun to identify susceptibility genes for complex traits and diseases. These studies involve the characterization and analysis of very-high-resolution SNP genotype data for hundreds or thousands of individuals. We describe a(More)
Establishing the age of each mutation segregating in contemporary human populations is important to fully understand our evolutionary history and will help to facilitate the development of new approaches for disease-gene discovery. Large-scale surveys of human genetic variation have reported signatures of recent explosive population growth, notable for an(More)
Knowledge of haplotype phase is valuable for many analysis methods in the study of disease, population, and evolutionary genetics. Considerable research effort has been devoted to the development of statistical and computational methods that infer haplotype phase from genotype data. Although a substantial number of such methods have been developed, they(More)
Levels of circulating glucose are tightly regulated. To identify new loci influencing glycemic traits, we performed meta-analyses of 21 genome-wide association studies informative for fasting glucose, fasting insulin and indices of beta-cell function (HOMA-B) and insulin resistance (HOMA-IR) in up to 46,186 nondiabetic participants. Follow-up of 25 loci in(More)