Recovery of Deleted Deep Sequencing Data Sheds More Light on the Early Wuhan SARS-CoV-2 Epidemic

  title={Recovery of Deleted Deep Sequencing Data Sheds More Light on the Early Wuhan SARS-CoV-2 Epidemic},
  author={Jesse D. Bloom},
  journal={Molecular Biology and Evolution},
  pages={5211 - 5224}
  • J. Bloom
  • Published 22 June 2021
  • Biology
  • Molecular Biology and Evolution
The origin and early spread of SARS-CoV-2 remains shrouded in mystery. Here I identify a data set containing SARS-CoV-2 sequences from early in the Wuhan epidemic that has been deleted from the NIH’s Sequence Read Archive. I recover the deleted files from the Google Cloud, and reconstruct partial sequences of 13 early epidemic viruses. Phylogenetic analysis of these sequences in the context of carefully annotated existing data suggests that the Huanan Seafood Market sequences that are the focus… 
Continuous mutation of SARS-CoV-2 during migration via three routes at the beginning of the pandemic
The rate of mutation was found to be comparable to that of the influenza H1N1 virus, which causes recurrent seasonal epidemics and another threat imposed by SARS-CoV-2 is that if the pandemic cannot be contained, new variants may emerge annually, preventing herd immunity.
The molecular epidemiology of multiple zoonotic origins of SARS-CoV-2.
It is shown that SARS-CoV-2 genomic diversity before February 2020 likely comprised only two distinct viral lineages, denoted A and B, and that these lineages were the result of at least two separate cross-species transmission events into humans.
Both simulation and sequencing data reveal coinfections with multiple SARS-CoV-2 variants in the COVID-19 pandemic
Immunological challenges of the “new” infections: corona viruses
  • A. Rees
  • Biology
    A New History of Vaccines for Infectious Diseases
  • 2022
TopHap: Rapid inference of key phylogenetic structures from common haplotypes in large genome collections with limited diversity
The TopHap approach is presented, which determines spatiotemporally common haplotypes of common variants and builds their phylogeny at a fraction of the computational time of traditional methods and recovers key phylogenetic relationships from more traditional analyses.
Assessing the Relative Probability of a Zoonosis or Lab-Leak as the Origin of the SARS-CoV-2 Pandemic
While there has been significant public debate on the origins of SARS-CoV-2, the scientific debate has been unusually subdued. Members of the scientific community have often stated that a zoonosis is
Passive Immunotherapy Against SARS-CoV-2: From Plasma-Based Therapy to Single Potent Antibodies in the Race to Stay Ahead of the Variants
The recent rise and dominance of the Omicron family of variants, including the rather disparate BA.1 and BA.2 variants, demonstrate the need to continue to find new approaches to neutralize the rapidly evolving SARS-CoV-2 virus.
Preprint servers and patent prior art
Posting papers on preprint servers creates patent 'prior art' and is likely to affect the patentability of any underlying invention.
Opinion: A call for an independent inquiry into the origin of the SARS-CoV-2 virus
  • N. Harrison, J. Sachs
  • Medicine
    Proceedings of the National Academy of Sciences of the United States of America
  • 2022
Waiting for the truth: is reluctance in accepting an early origin hypothesis for SARS-CoV-2 delaying our understanding of viral emergence?
Two years after the start of the COVID-19 pandemic, key questions about the emergence of its aetiological agent (SARS-CoV-2) remain a matter of considerable debate. Identifying when SARS-CoV-2 began


On the origin and continuing evolution of SARS-CoV-2
The results suggest that the development of new variations in functional sites in the receptor-binding domain (RBD) of the spike seen in SARS-CoV-2 and viruses from pangolin SARSr-CoVs are likely caused by natural selection besides recombination.
Stability of SARS-CoV-2 phylogenies
It is found that some recurrent mutations in reported SARS-CoV-2 genome sequences have been observed predominantly or exclusively by single labs, co-localize with commonly used primer binding sites and are more likely to affect the protein coding sequences than other similarly recurrent mutations.
An evolutionary portrait of the progenitor SARS-CoV-2 and its dominant offshoots in COVID-19 pandemic
The reconstructed mutational progression of SARS-CoV-2 predicts the genome sequence of the progenitor virus whose earliest offspring, without any non-synonymous mutations, were still spreading worldwide months after the report of COVID-19, which likely arose in Europe and North America after the genesis of the ancestral lineages in China.
Insights on early mutational events in SARS-CoV-2 virus reveal founder effects across geographical regions
Mutations in the helicase and orf1a coding regions from SARS-CoV-2 were predominant, among others, suggesting that these proteins are prone to evolve by natural selection.
A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology
A rational and dynamic virus nomenclature that uses a phylogenetic framework to identify those lineages that contribute most to active spread and is designed to provide a real-time bird’s-eye view of the diversity of the hundreds of thousands of genome sequences collected worldwide.
A dynamic nomenclature proposal for SARS-CoV-2 to assist genomic epidemiology
This work presents a rational and dynamic virus nomenclature that uses a phylogenetic framework to identify those lineages that contribute most to active spread and will assist in tracking and understanding the patterns and determinants of the global spread of SARS-CoV-2.
Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic
Estimates are obtained from three approaches that the most likely divergence date of SARS-CoV-2 from its most closely related available bat sequences ranges from 1948 to 1982, indicating that there are high levels of co-infection in horseshoe bats and that the viral pool can generate novel allele combinations and substantial genetic diversity.
De-novo Assembly of RaTG13 Genome Reveals Inconsistencies Further Obscuring SARS-CoV-2 Origins
This work is a call to action for the scientific community to better collate scientific evidence about the origins of SARS-CoV-2 so that future incidence of such pandemics may be effectively mitigated.
Cryptic transmission of SARS-CoV-2 in Washington State
The large majority of SARS-CoV-2 infections sampled during this time frame appeared to have derived from a single introduction event into the state in late January or early February 2020 and subsequent local spread, strongly suggesting cryptic spread of COVID-19 during the months of January and February 2020, before active community surveillance was implemented.
Phylogenetic network analysis of SARS-CoV-2 genomes
A phylogenetic network of SARS-CoV-2 genomes sampled from across the world faithfully traces routes of infections for documented coronavirus disease 2019 (COVID-19) cases, indicating that phylogenetic networks can likewise be successfully used to help trace undocumented CO VID-19 infection sources, which can be quarantined to prevent recurrent spread of the disease worldwide.