Building the sequence map of the human pan-genome

@article{Li2010BuildingTS,
  title={Building the sequence map of the human pan-genome},
  author={Ruiqiang Li and Yingrui Li and Hancheng Zheng and Ruibang Luo and Hongmei Zhu and Qibin Li and Wubin Qian and Yuanyuan Ren and Geng Tian and Jinxiang Li and Guangyu Zhou and Xuan Zhu and Honglong Wu and Junjie Qin and Xin Jin and Dongfang Li and Hongzhi Cao and Xueda Hu and H{\'e}l{\`e}ne Blanch{\'e} and Howard M. Cann and Xiuqing Zhang and Songgang Li and Lars Bolund and Karsten Kristiansen and Huanming Yang and Jun Wang and Jian Wang},
  journal={Nature Biotechnology},
  year={2010},
  volume={28},
  pages={57-63}
}
Here we integrate the de novo assembly of an Asian and an African genome with the NCBI reference human genome, as a step toward constructing the human pan-genome. We identified ∼5 Mb of novel sequences not present in the reference genome in each of these assemblies. Most novel sequences are individual or population specific, as revealed by their comparison to all available human DNA sequence and by PCR validation using the human genome diversity cell line panel. We found novel sequences present… 

Building a Chinese pan-genome of 486 individuals

TLDR
This study authenticates the Chinese pan-genome as representative of DNA sequences specific to the Han Chinese population missing from the GRCh38 reference genome and establishes the newly defined common sequences as candidates to supplement the current human reference.

Assembly of a pan-genome from deep sequencing of 910 humans of African descent

TLDR
A deeply sequenced dataset of 910 individuals, all of African descent, is used to construct a set of DNA sequences that is present in these individuals but missing from the reference human genome, demonstrating that the African pan-genome contains ~10% more DNA than the current human reference genome.

De novo assembly of a haplotype-resolved human genome

TLDR
This haplotype-resolved diploid genome represents the most complete de novo human genome assembly to date and should aid in translating genotypes to phenotypes for the development of personalized medicine.

Using population admixture to help complete maps of the human genome

TLDR
This work mapped the locations of 70 scaffolds spanning 4 million base pairs of the human genome's unplaced euchromatic sequence, including more than a dozen protein-coding genes, and identified 8 new large interchromosomal segmental duplications.

Comprehensively identifying and characterizing the missing gene sequences in human reference genome with integrated analytic approaches

TLDR
The study comprehensively identified and characterized the missing genes of human reference genome with RNA-Seq data from 16 different human tissues, and proposes the potential reasons causing genes missed from the genome.

Whole-genome sequencing and comprehensive variant analysis of a Japanese individual using massively parallel sequencing

TLDR
The analysis of a Japanese male using high-throughput sequencing to ×40 coverage suggests that considerable variation remains undiscovered in the human genome and that whole-genome sequencing is an invaluable tool for obtaining a complete understanding of human genetic variation.

NSIT: Novel Sequence Identification Tool

TLDR
This work developed NSIT (Novel Sequence Identification Tool), a software that can accurately and efficiently identify novel sequences in an individual's de novo whole genome assembly and outperforms, by large margins, other fast sequence aligners previously applied to this task.

Towards a reference genome that captures global genetic diversity

TLDR
This work analyzes 338 high-quality human assemblies of genetically divergent human populations to identify missing sequences in the human reference genome with breakpoint resolution and constructs a Human Diversity Reference, which helps improve genome annotations.

Assembly-free discovery of human novel sequences using long reads

TLDR
This study designed an Assembly-Free Novel Sequence (AF-NS) approach to identify novel sequences from Oxford Nanopore Technology long reads and revealed their association with the binding motifs of transcription factors.

Pan-genomics in the human genome era

TLDR
Reviews efforts to create pan-genomes for a range of species, from bacteria to humans, and further considers the computational methods that have been proposed in order to capture, interpret and comparePan-genome data.
...

References

SHOWING 1-10 OF 39 REFERENCES

The diploid genome sequence of an Asian individual

TLDR
Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly, and the potential usefulness of next-generation sequencing technologies for personal genomics.

Finishing the euchromatic sequence of the human genome

TLDR
The near-complete sequence reported here should serve as a firm foundation for biomedical research in the decades ahead and greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death.

Closing gaps in the human genome with fosmid resources generated from multiple individuals

TLDR
The analysis confirms that not all gaps within 'finished' genomes are recalcitrant to subcloning and suggests that the paired-end-sequenced fosmid libraries could prove to be a rich resource for completion of the human euchromatic genome.

The Diploid Genome Sequence of an Individual Human

TLDR
A modified version of the Celera assembler is developed to facilitate the identification and comparison of alternate alleles within this individual diploid genome, and a novel haplotype assembly strategy is used, able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploids nature of the genome.

A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms

TLDR
This high-density SNP map provides a public resource for defining haplotype variation across the genome, and should help to identify biomedically important genes for diagnosis and therapy.

Mapping and sequencing of structural variation from eight human genomes

TLDR
This work employs a clone-based method to interrogate intermediate structural variation in eight individuals of diverse geographic ancestry and provides the first high-resolution sequence map of human structural variation—a standard for genotyping platforms and a prelude to future individual genome sequencing projects.

Genome assembly comparison identifies structural variants in the human genome

TLDR
Through comparison of two human assemblies, genome assembly comparison is shown to be a robust approach for identification of all classes of genetic variation, highlighting the need for comprehensive annotation strategies to fully interpret genome scanning and personalized sequencing projects.

A haplotype map of the human genome

TLDR
A public database of common variation in the human genome: more than one million single nucleotide polymorphisms for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted.

A haplotype map of the human genome.

TLDR
A public database of common variation in the human genome: more than one million single nucleotide polymorphisms for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted.

Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry

TLDR
An approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost is reported, effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications.