CoV-Seq: SARS-CoV-2 Genome Analysis and Visualization

@article{Liu2020CoVSeqSG,
  title={CoV-Seq: SARS-CoV-2 Genome Analysis and Visualization},
  author={Boxiang Liu and Kaibo Liu and He Zhang and Liang Zhang and Yuchen Bian and Liang Huang},
  journal={bioRxiv},
  year={2020}
}
Summary COVID-19 has become a global pandemic not long after its inception in late 2019. SARS-CoV-2 genomes are being sequenced and shared on public repositories at a fast pace. To keep up with these updates, scientists need to frequently refresh and reclean datasets, which is ad hoc and labor-intensive. Further, scientists with limited bioinformatics or programming knowledge may find it difficult to analyze SARS-CoV-2 genomes. In order to address these challenges, we developed CoV-Seq, a… 
SARS-CoV-2 sequence typing, evolution and signatures of selection using CoVa, a Python-based command-line utility
TLDR
CoVa is a fast, accurate and user-friendly utility to perform a variety of genome analyses on hundreds of SARS-CoV-2 sequences and shows differences in sequence type distribution between sequences from India and those from the rest of the world.
A Review on Viral Data Sources and Integration Methods for COVID-19 Mitigation
TLDR
The data integration efforts required for accessing and searching genome sequences and metadata of SARS-CoV2, the virus responsible for the COVID-19 disease, which have been deposited into the most important repositories of viral sequences.
High Performance Integration Pipeline for Viral and Epitope Sequences
TLDR
This pipeline made it possible to design and develop fundamental resources for any researcher interested in understanding the biological mechanisms behind the viral infection, and plays a crucial role in many analytic and visualization tools, such as ViruSurf, Episurf, VirusViz, and VirusLab.
A review on viral data sources and search systems for perspective mitigation of COVID-19
TLDR
The data integration efforts required for accessing and searching genome sequences and metadata of SARS-CoV2, the virus responsible for the COVID-19 disease, which have been deposited into the most important repositories of viral sequences.
ViruSurf: an integrated database to investigate viral sequences
TLDR
ViruSurf is a large public database of viral sequences and integrated and curated metadata from heterogeneous sources (GenBank, COG-UK and NMDC), which may enable faster responses to future threats that could arise from new viruses.

References

SHOWING 1-10 OF 13 REFERENCES
VAPiD: a lightweight cross-platform viral annotation pipeline and identification tool to facilitate virus genome submissions to NCBI GenBank
TLDR
A portable, lightweight, user-friendly, internet-enabled, open-source, command-line genome annotation and submission package to facilitate virus genome submissions to NCBI GenBank is created.
VIGOR, an annotation program for small viral genomes
TLDR
This is the first gene prediction program for rotavirus and rhinovirus for public access and VIGOR is able to accurately predict protein coding genes for the above five viral types and has the capability to assign function to the predicted open reading frames and genotype influenza virus.
NCBI Viral Genomes Resource
TLDR
The NCBI Viral Genomes Resource is a reference resource designed to bring order to this sequence shockwave and improve usability of viral sequence data.
Nextstrain: real-time tracking of pathogen evolution
TLDR
Nextstrain consists of a database of viral genomes, a bioinformatics pipeline for phylodynamics analysis, and an interactive visualisation platform that presents a real-time view into the evolution and spread of a range of viral pathogens of high public health importance.
The EMBL Nucleotide Sequence Database
TLDR
Changes over the past year include the removal of the sequence length limit, the launch of the EMBLCDSs dataset, extension of the Sequence Version Archive functionality and the revision of quality rules for TPA data.
A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff
TLDR
It appears that the 5′ and 3′ UTRs are reservoirs for genetic variations that changes the termini of proteins during evolution of the Drosophila genus.
Data, disease and diplomacy: GISAID's innovative contribution to global health
TLDR
The article finds that the Global Initiative on Sharing All Influenza Data contributes to global health in at least five ways: collating the most complete repository of high‐quality influenza data in the world; facilitating the rapid sharing of potentially pandemic virus information during recent outbreaks; supporting the World Health Organization's biannual seasonal flu vaccine strain selection process; developing informal mechanisms for conflict resolution around the sharing of virus data.
A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data
  • Heng Li
  • Biology, Computer Science
    Bioinform.
  • 2011
TLDR
This work presents a statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data without explicit genotyping or linkage-based imputation and demonstrates that this method achieves comparable accuracy to alternative methods for estimating site allele count, for inferring allele frequency spectrum and for association mapping.
MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability
TLDR
This version of MAFFT has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update.
An interactive web-based dashboard to track COVID-19 in real time
...
...