Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly.

@article{Schneider2017EvaluationOG,
  title={Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly.},
  author={Valerie A. Schneider and Tina A Graves-Lindsay and Kerstin Howe and Nathan Bouk and Hsiu-Chuan Chen and Paul A. Kitts and Terence D. Murphy and Kim D. Pruitt and Françoise Thibaud-Nissen and Derek Albracht and Robert S. Fulton and Milinn Kremitzki and Vincent J. Magrini and Chris Markovic and Sean D McGrath and Karyn Meltz Steinberg and Katherine Auger and William Chow and Joanna Collins and Glenn Harden and Tim J. P. Hubbard and Sarah Pelan and Jared T. Simpson and Glen Threadgold and James Torrance and Jonathan M. D. Wood and Laura Clarke and Sergey Koren and Matthew Boitano and Paul Peluso and Heng Li and C. Chin and Adam M. Phillippy and Richard Durbin and Richard K. Wilson and Paul Flicek and Evan E. Eichler and Deanna M. Church},
  journal={Genome research},
  year={2017},
  volume={27 5},
  pages={
          849-864
        }
}
The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and… 

Figures and Tables from this paper

Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome
TLDR
This assembly represents a ∼400-fold improvement in continuity due to properly assembled gaps, compared to the previously published C. hircus assembly, and better resolves repetitive structures longer than 1 kb, representing the largest repeat family and immune gene complex yet produced for an individual of a ruminant species.
Alignment of 1000 Genomes Project reads to reference assembly GRCh38
TLDR
This work has finished remapping all of the 1000 Genomes sequence reads to GRCh38 with alternative scaffold–aware BWA-MEM and the resulting alignments are available as CRAM, a reference-based sequence compression format.
Automated assembly of high-quality diploid human reference genomes
TLDR
Developing a combination of all the top performing methods, this work generated the first high- quality diploid reference assembly, containing only ∼4 gaps per chromosome, most within + 1% of CHM13’s length.
A complete reference genome improves analysis of human genetic variation
TLDR
How the T2T-CHM13 reference genome universally improves read mapping and variant calling for 3,202 and 17 globally diverse samples sequenced with short and long reads, respectively is demonstrated.
Sequencing and de novo assembly of 150 genomes from Denmark as a population reference
TLDR
This study provides a regional reference genome that it is possible to construct excellent de novo assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases and is expected to improve the power of future association mapping studies and pave the way for precision medicine initiatives, which now are being launched in many countries including Denmark.
Construction and integration of three de novo Japanese human genome assemblies toward a population-specific reference
TLDR
The authors assemble three Japanese genomes to create a population-specific reference genome JG1 and demonstrate improved variant calling from exome sequencing with this reference genome.
Construction of Whole Genomes from Scaffolds Using Single Cell Strand-Seq Data
TLDR
The ability of Strand-seq to build and correct full-length chromosomes is demonstrated, by identifying which scaffolds belong to the same chromosome and determining their correct order and orientation, without the need for overlapping sequences.
New gains with a slimmer genome – An automated approach to improve reference genome assemblies
TLDR
An integrative approach to improve contiguity and haploidy of a reference genome assembly with two novel features of Lep-Anchor software and a combination of dense linkage maps, overlap detection and bridging long reads is described.
De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data
TLDR
De novo assembly of two Swedish genomes using long-read sequencing and optical mapping, resulting in total assembly sizes of nearly 3 Gb and hybrid scaffold N50 values of over 45 Mb, revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual.
Implications of Genetic Distance to Reference and De Novo Genome Assembly for Clinical Genomics in Africans
TLDR
With improvement in coverage, the assembly-based approach can offer a viable alternative to the alignment- based approach, with the advantage that it can obviate the need to generate diverse human reference sequences or collections of alternate scaffolds.
...
...

References

SHOWING 1-10 OF 90 REFERENCES
De novo assembly and phasing of a Korean human genome
TLDR
This work presents the most contiguous diploid human genome assembly so far, with extensive investigation of unreported and Asian-specific structural variants, and high-quality haplotyping of clinically relevant alleles for precision medicine.
Resolving the complexity of the human genome using single-molecule sequencing
TLDR
A greater complexity of the human genome in the form of variation of longer and more complex repetitive DNA that can now be largely resolved with the application of this longer-read sequencing technology is suggested.
Assemblathon 1: a competitive assessment of de novo short read assembly methods.
TLDR
The Assemblathon 1 competition is described, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies, and it is established that it is possible to assemble the genome to a high level of coverage and accuracy.
Genetic variation and the de novo assembly of human genomes
TLDR
Recent technological advances that improve both contiguity and accuracy are summarized and the importance of complete de novo assembly as opposed to read mapping is emphasized as the primary means to understanding the full range of human genetic variation.
Single haplotype assembly of the human genome from a hydatidiform mole
TLDR
Analysis of gene and repeat content show this assembly to be of excellent quality and contiguity, and comparisons to ClinVar and the NHGRI GWAS catalog show that the CHM1 genome does not harbor an excess of deleterious alleles, but comparison to assembly-independent resources, such as BAC clone end sequences and long reads generated by a different sequencing technology, indicate misassembled regions.
De novo assembly of a haplotype-resolved human genome
TLDR
This haplotype-resolved diploid genome represents the most complete de novo human genome assembly to date and should aid in translating genotypes to phenotypes for the development of personalized medicine.
Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species
TLDR
The high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.
Finishing the euchromatic sequence of the human genome
TLDR
The near-complete sequence reported here should serve as a firm foundation for biomedical research in the decades ahead and greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death.
Finishing the euchromatic sequence of the human genome
TLDR
The near-complete sequence reported here should serve as a firm foundation for biomedical research in the decades ahead and greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death.
Phased diploid genome assembly with single-molecule real-time sequencing
TLDR
The open-source FALCON and FALcon-Unzip algorithms are introduced to assemble long-read sequencing data into highly accurate, contiguous, and correctly phased diploid genomes.
...
...