Building the sequence map of the human pan-genome

@article{Li2010BuildingTS,
  title={Building the sequence map of the human pan-genome},
  author={R. Li and Yingrui Li and Hancheng Zheng and Ruibang Luo and Hong-mei Zhu and Qibin Li and W. Qian and Yuanyuan Ren and G. Tian and Jinxiang Li and Guangyu Zhou and Xuan Zhu and Honglong Wu and J. Qin and Xin Jin and Dongfang Li and H. Cao and Xueda Hu and H. Blanch{\'e} and H. Cann and Xiuqing Zhang and Songgang Li and L. Bolund and K. Kristiansen and H. Yang and J. Wang},
  journal={Nature Biotechnology},
  year={2010},
  volume={28},
  pages={57-63}
}
Here we integrate the de novo assembly of an Asian and an African genome with the NCBI reference human genome, as a step toward constructing the human pan-genome. We identified ∼5 Mb of novel sequences not present in the reference genome in each of these assemblies. Most novel sequences are individual or population specific, as revealed by their comparison to all available human DNA sequence and by PCR validation using the human genome diversity cell line panel. We found novel sequences present… Expand

Figures, Tables, and Topics from this paper

Assembly of a pan-genome from deep sequencing of 910 humans of African descent
TLDR
A deeply sequenced dataset of 910 individuals, all of African descent, is used to construct a set of DNA sequences that is present in these individuals but missing from the reference human genome, demonstrating that the African pan-genome contains ~10% more DNA than the current human reference genome. Expand
De novo assembly of a haplotype-resolved human genome
TLDR
This haplotype-resolved diploid genome represents the most complete de novo human genome assembly to date and should aid in translating genotypes to phenotypes for the development of personalized medicine. Expand
Using population admixture to help complete maps of the human genome
TLDR
This work mapped the locations of 70 scaffolds spanning 4 million base pairs of the human genome's unplaced euchromatic sequence, including more than a dozen protein-coding genes, and identified 8 new large interchromosomal segmental duplications. Expand
Whole-genome sequencing and comprehensive variant analysis of a Japanese individual using massively parallel sequencing
TLDR
The analysis of a Japanese male using high-throughput sequencing to ×40 coverage suggests that considerable variation remains undiscovered in the human genome and that whole-genome sequencing is an invaluable tool for obtaining a complete understanding of human genetic variation. Expand
Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data
TLDR
A pig pan-genome is constructed by comparing genome assemblies of 11 representative pig breeds with the reference genome to provide enhanced resolution for genetic diversity in pigs and can further improve the interpretation of local 3D structure. Expand
Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions
TLDR
A method to accurately genotype these new insertions by mapping next-generation sequencing datasets to the breakpoint is developed, thereby providing a means to characterize copy-number status for regions previously inaccessible to single-nucleotide polymorphism microarrays. Expand
NSIT: Novel Sequence Identification Tool
TLDR
This work developed NSIT (Novel Sequence Identification Tool), a software that can accurately and efficiently identify novel sequences in an individual's de novo whole genome assembly and outperforms, by large margins, other fast sequence aligners previously applied to this task. Expand
Towards a reference genome that captures global genetic diversity
TLDR
This work analyzes 338 high-quality human assemblies of genetically divergent human populations to identify missing sequences in the human reference genome with breakpoint resolution and constructs a Human Diversity Reference, which helps improve genome annotations. Expand
Genetic variation and the de novo assembly of human genomes
TLDR
Recent technological advances that improve both contiguity and accuracy are summarized and the importance of complete de novo assembly as opposed to read mapping is emphasized as the primary means to understanding the full range of human genetic variation. Expand
Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly
TLDR
It is demonstrated that whole-genome de novo assembly is a feasible approach to deriving more comprehensive maps of genetic variation and can resolve complex rearrangements. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 40 REFERENCES
The diploid genome sequence of an Asian individual
TLDR
Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly, and the potential usefulness of next-generation sequencing technologies for personal genomics. Expand
Finishing the euchromatic sequence of the human genome
The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome SequencingExpand
Closing gaps in the human genome with fosmid resources generated from multiple individuals
TLDR
The analysis confirms that not all gaps within 'finished' genomes are recalcitrant to subcloning and suggests that the paired-end-sequenced fosmid libraries could prove to be a rich resource for completion of the human euchromatic genome. Expand
The Diploid Genome Sequence of an Individual Human
TLDR
A modified version of the Celera assembler is developed to facilitate the identification and comparison of alternate alleles within this individual diploid genome, and a novel haplotype assembly strategy is used, able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploids nature of the genome. Expand
A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms
TLDR
This high-density SNP map provides a public resource for defining haplotype variation across the genome, and should help to identify biomedically important genes for diagnosis and therapy. Expand
Mapping and sequencing of structural variation from eight human genomes
TLDR
This work employs a clone-based method to interrogate intermediate structural variation in eight individuals of diverse geographic ancestry and provides the first high-resolution sequence map of human structural variation—a standard for genotyping platforms and a prelude to future individual genome sequencing projects. Expand
Genome assembly comparison identifies structural variants in the human genome
TLDR
Through comparison of two human assemblies, genome assembly comparison is shown to be a robust approach for identification of all classes of genetic variation, highlighting the need for comprehensive annotation strategies to fully interpret genome scanning and personalized sequencing projects. Expand
A haplotype map of the human genome
TLDR
A public database of common variation in the human genome: more than one million single nucleotide polymorphisms for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. Expand
A haplotype map of the human genome.
TLDR
A public database of common variation in the human genome: more than one million single nucleotide polymorphisms for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. Expand
Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry
TLDR
An approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost is reported, effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications. Expand
...
1
2
3
4
...