A global reference for human genetic variation

  title={A global reference for human genetic variation},
  author={Adam Gonçalo R. David M. Richard M. Gonçalo R. David R. Auton Abecasis Altshuler Durbin Abecasis Bentley C and Adam Auton and Gonçalo R. Abecasis and David M Altshuler and Richard Durbin and D. R. Bentley and Aravinda Chakravarti and Andrew G. Clark and Peter Donnelly and Evan E. Eichler and Paul Flicek and S. Gabriel and Richard A. Gibbs and Eric D. Green and Matthew E. Hurles and Bartha Maria Knoppers and Jan O. Korbel and Eric S. Lander and Charles Lee and Hans Lehrach and Elaine R. Mardis and Gabor T. Marth and Gil McVean and Deborah A. Nickerson and Jeanette Schmidt and Stephen T. Sherry and Jun Wang and Richard K. Wilson and Kathleen C. Barnes and Christine M. Beiswanger and Esteban Gonz{\'a}lez Burchard and Carlos D. Bustamante and Hongyu Cai and Hongzhi Cao and Norman P. Gerry and Neda Gharani and Christopher R. Gignoux and Simon Gravel and Brenna M. Henn and Danielle Jones and Lynn B. Jorde and Jane Kaye and Alon Keinan and Alastair Kent and Angeliki Kerasidou and Yingrui Li and Rasika A. Mathias and Andr{\'e}s Moreno-Estrada and Pilar N. Ossorio and Michael Parker and Alissa M. Resch and Charles N. Rotimi and Charmaine D. M. Royal and Karla Sandoval and Yeyang Su and Ralf Sudbrak and Zhongming Tian and Sarah A. Tishkoff and Lorraine H. Toji and Chris Tyler-Smith and Marc Via and Yuhong Wang and Huanming Yang and Ling Yang and Jia Zhu and Lisa D. Brooks and Adam L. Felsenfeld and Jean E Mcewen and Yekaterina Vaydylevich and Audrey Duncanson and Michael Dunn and Jeffery A. Schloss and Erik P Garrison and Hyun Min Kang and Jonathan Marchini and Shane A. McCarthy},
  pages={68 - 74}
  • Adam Gonçalo R. David M. Richard M. Gonçalo R. David R. Auton Abecasis Altshuler Durbin Abecasis Bentley C, A. Auton, Shane A. McCarthy
  • Published 30 September 2015
  • Biology
  • Nature
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. [] Key Result We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes.
Multi-platform discovery of haplotype-resolved structural variation in human genomes
A suite of long- and short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms are applied to comprehensively analyze three human parent–child trios to define the full spectrum of human genetic variation in a haplotype-resolved manner.
A high-quality human reference panel reveals the complexity and distribution of genomic structural variants
This work analyses whole genome sequencing data of 769 individuals from 250 Dutch families, and provides a haplotype-resolved map of 1.9 million genome variants across 9 different variant classes, including novel forms of complex indels, and retrotransposition-mediated insertions of mobile elements and processed RNAs.
A high-quality reference panel reveals the complexity and distribution of structural genome changes in a human population
This work analyzes whole genome sequencing data of 769 individuals from 250 Dutch families and provides a haplotype-resolved map of 1.9 million genome variants across 9 different variant classes, including novel forms of complex indels, and retrotransposition-mediated insertions of mobile elements and processed RNAs.
Haplotype-resolved diverse human genomes and integrated analysis of structural variation
A recently developed computational pipeline that combines long-read technology and single-cell template strand sequencing (Strand-seq) to generate fully phased diploid genome assemblies without guidance of a reference genome or use of parent-child trio information is leveraged.
Human genetic variation database, a reference database of genetic variations in the Japanese population
The results illustrate the importance of constructing an ethnicity-specific reference genome for identifying rare variants and constructed a Japanese-specific major allele reference genome, by which the number of unique mapping of the short reads in the data has increased 0.045% on average.
Discovery and genotyping of structural variation from long-read haploid genome sequence data.
Interestingly, when the authors repeat SV detection on a pseudodiploid genome constructed in silico by merging the two haploids, it is found that ∼59% of the heterozygous SVs are no longer detected by SMRT-SV, indicating that haploid resolution of long-read sequencing data will significantly increase sensitivity of SV detection.
De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations
De novo assemblies of 17 diverse, haplotype-resolved genomes are analyzed to gain insights into the structure of genetic diversity and compile a list of alternative haplotypes across populations to depict the complete spectrum of genetic Diversity across populations.
Deep whole-genome sequencing of 90 Han Chinese genomes
This work has performed whole-genome sequencing at a high depth of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, and built a greatly expanded spectrum of genetic variation for the Han Chinese genome.
Genome maps across 26 human populations reveal population-specific patterns of structural variation
Analyzing optical genome maps of 154 individuals from the 26 populations sequenced in the 1000 Genomes Project, it is found that phylogenetic population patterns of large SVs are similar to those of single nucleotide variations in 86% of the human genome, while ~2% ofThe genome has high structural complexity.
Evaluation of whole exome sequencing as an alternative to BeadChip and whole genome sequencing in human population genetic analysis
It is demonstrated that exome sequencing provides a better alternative to the array-based methods for population genetic analysis and a strategy for unbiased variant collection from exome data is proposed and a bioinformatics protocol for proper data processing is offered.


A map of human genome variation from population-scale sequencing
The pilot phase of the 1000 Genomes Project is presented, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms, and the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants are described.
An integrated map of genetic variation from 1,092 human genomes
It is shown that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites.
Discovery and genotyping of genome structural polymorphism by sequencing on a population scale
An analytical framework for characterizing genome deletion polymorphism in populations using sequence data that are distributed across hundreds or thousands of genomes is presented, which offers a way to relate genome structural polymorphism to complex disease in populations.
Mapping and sequencing of structural variation from eight human genomes
This work employs a clone-based method to interrogate intermediate structural variation in eight individuals of diverse geographic ancestry and provides the first high-resolution sequence map of human structural variation—a standard for genotyping platforms and a prelude to future individual genome sequencing projects.
A second generation human haplotype map of over 3.1 million SNPs
The Phase II HapMap is described, which characterizes over 3.1 million human single nucleotide polymorphisms genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed, and increased differentiation at non-synonymous, compared to synonymous, SNPs is demonstrated.
The landscape of human STR variation.
The largest-scale analysis of human STR variation to date is reported, using the call set collected in Phase 1 of the 1000 Genomes Project to analyze determinants of STR variation, assess the human reference genome's representation of STR alleles, find STR loci with common loss-of-function allele, and obtain initial estimates of the linkage disequilibrium between STRs and common SNPs.
An integrated map of structural variation in 2,504 human genomes
An integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which are constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations are described.
Combinatorial Algorithms for Structural Variation Detection in High Throughput Sequenced Genomes
Algorithms for identifying various forms of structural variation between a paired-end NGS sequenced genome and a reference genome are described for the first time.
Haplotype-resolved genome sequencing of a Gujarati Indian individual
The throughput of massively parallel sequencing with the contiguity information provided by large-insert cloning is combined to experimentally determine the haplotype-resolved genome of a South Asian individual.
Large-scale whole-genome sequencing of the Icelandic population
The insights gained from sequencing the whole genomes of Icelanders to a median depth of 20× provide a study design that can be used to determine how variation in the sequence of the human genome gives rise to human diversity.