Corpus ID: 212718240

The design and construction of reference pangenome graphs

  title={The design and construction of reference pangenome graphs},
  author={Heng Li and Xiaowen Feng and Chong Chu},
  journal={arXiv: Genomics},
The recent advances in sequencing technologies enables the assembly of individual genomes to the reference quality. How to integrate multiple genomes from the same species and to make the integrated representation accessible to biologists remain an open challenge. Here we propose a graph-based data model and associated formats to represent multiple genomes while preserving the coordinate of the linear reference genome. We implemented our ideas in the minigraph toolkit and demonstrate that we… Expand

Figures and Tables from this paper

Novel functional sequences uncovered through a bovine multi-assembly graph
A multi-assembly graph from six reference-quality assemblies from taurine cattle and their close relatives is built and it is shown that the non-reference sequences contain polymorphic sites that segregate within and between breeds of cattle. Expand
Genome Assembly : A Review
  • Arun Kumar, Vishal Verma
  • 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS)
  • 2021
Genome is complete set of DNA, including all of its genes. After getting reads from sequencing technologies, it’s a huge task to reconstruct the original genome because there are millions ofExpand
A tri-tuple coordinate system derived for fast and accurate analysis of the colored de Bruijn graph-based pangenomes
A new method is developed, a colored superbubble ( cSupB ), that can overcome the complexity of graphs and organize a set of species- or population-specific haplotype sequences of interest and can be extended to any colored directed acyclic graph. Expand
Reducing reference bias using multiple population reference genomes
This work proposes the “reference flow” alignment method that uses multiple population reference genomes to improve alignment accuracy and reduce reference bias, and compared to the graph aligner vg achieves a similar level of accuracy and bias avoidance, but with 14% of the memory footprint and 5.5 times the speed. Expand
Reference flow: reducing reference bias using multiple population genomes
This work proposes the reference flow alignment method that uses multiple population reference genomes to improve alignment accuracy and reduce reference bias, and compared to the graph aligner vg achieves a similar level of accuracy and bias avoidance but with 14% of the memory footprint and 5.5 times the speed. Expand
Plant pan-genomes are the new reference
This Review summarizes the growth of plant pan-genome studies, explores the origins of gene presence and absence variation, and introduces the impacts ofpan-genomes on plant biology, breeding and evolutionary studies. Expand
Panache: a Web Browser-Based Viewer for Linearized Pangenomes
Panache, a tool for the visualization and exploration of linear representations of gene-based and sequence-based pangenomes, is introduced, which uses a layout similar to genome browsers to display presence absence variations and additional tracks along a linear axis with a pangsenomics perspective. Expand
GraphAligner: Rapid and Versatile Sequence-to-Graph Alignment
GraphAligner is a tool for aligning long reads to genome graphs that is 12x faster and uses 5x less memory, making it as efficient as aligning reads to linear reference genomes. Expand
StrainFLAIR: Strain-level profiling of metagenomic samples using variation graphs
Results demonstrated how graph representation of multiple close genomes can be used as a reference to characterize a sample at the strain level. Expand
Haplotype-resolved de novo assembly with phased assembly graphs
Hifiasm is described, a new de novo assembler that takes advantage of long high-fidelity sequence reads to faithfully represent the haplotype information in a phased assembly graph and strives to preserve the contiguity of all haplotypes. Expand


Ten years of pan-genome analyses.
This work reviews recent implementations of the pan-genome approach, its impact and limits, and proposes possible extensions, including analyses at the whole genome multiple sequence alignment level. Expand
A strategy for building and using a human reference pangenome
In March 2019, 45 scientists and software engineers from around the world converged at the University of California, Santa Cruz for the first pangenomics codeathon. The purpose of the meeting was toExpand
Fast and accurate genomic analyses using genome graphs
The human reference genome serves as the foundation for genomics by providing a scaffold for alignment of sequencing reads, but currently only reflects a single consensus haplotype, thus impairingExpand
Genotyping structural variants in pangenome graphs using the vg toolkit
This work shows that variation graphs, as implemented in the vg toolkit, provide an effective means for leveraging SV catalogs for short-read SV genotyping experiments and benchmarked vg against state-of-the-art SV genotypesers using three sequence-resolved SV Catalogs generated by recent long-read sequencing studies. Expand
Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly.
It is asserted that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote the understanding of human biology and advance the efforts to improve health. Expand
Variation graph toolkit improves read mapping by representing genetic variation in the reference
Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. PoorExpand
NovoGraph: Human genome graph construction from multiple long-read de novo assemblies
NovoGraph, a method for the construction of a human genome graph directly from a set of de novo assemblies, is presented, which constructs a genome-wide multiple sequence alignment of all input contigs and creates a graph by merging the input sequences at positions that are both homologous and sequence-identical. Expand
FORGe: prioritizing variants for graph genomes
It is shown that FORGe enables a range of advantageous and measurable trade-offs between accuracy and computational overhead. Expand
Integrative Genomics Viewer
The sheer volume and scope of data posed by this flood of data pose a significant challenge to the development of efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data. Expand
Computational pan-genomics: status, promises and challenges
  • Tobias Manja Thomas Louis Bas E Ali Paul Wigard P Veli Ad Marschall Marz Abeel Dijkstra Dutilh Ghaffaari Ker, T. Marschall, +57 authors A. Schönhuth
  • Computer Science
  • Briefings in Bioinformatics
  • 2016
Already available approaches to construct and use pan-genomes are examined, the potential benefits of future technologies and methodologies are discussed, and open challenges from the vantage point of the above-mentioned biological disciplines are reviewed. Expand