Statistical Challenges in Tracking the Evolution of SARS-CoV-2.

  title={Statistical Challenges in Tracking the Evolution of SARS-CoV-2.},
  author={Lorenzo Cappello and Jaehee Kim and Sifan Liu and Julia A. Palacios},
  journal={Statistical science : a review journal of the Institute of Mathematical Statistics},
  volume={37 2},
Genomic surveillance of SARS-CoV-2 has been instrumental in tracking the spread and evolution of the virus during the pandemic. The availability of SARS-CoV-2 molecular sequences isolated from infected individuals, coupled with phylodynamic methods, have provided insights into the origin of the virus, its evolutionary rate, the timing of introductions, the patterns of transmission, and the rise of novel variants that have spread through populations. Despite enormous global efforts of… 

Figures and Tables from this paper

Analysis of 6.4 million SARS-CoV-2 genomes identifies mutations associated with fitness

PyR0, a hierarchical Bayesian multinomial logistic regression model that infers relative prevalence of all viral lineages across geographic regions, detects lineages increasing in prevalence, and identifies mutations relevant to fitness, is developed.

Analysis of 6.4 million SARS-CoV-2 genomes identifies mutations associated with fitness

Applying PyR0 to all publicly available SARS-CoV-2 genomes, numerous substitutions that increase transmissibility are identified, including previously identified spike mutations and many non-spike mutations within the nucleocapsid and nonstructural proteins.

Bayesian Inference of Dependent Population Dynamics in Coalescent Models

A novel probabilistic model that relies on jointly distributed Markov random fields to estimate past population dynamics of dependent populations and to quantify their degree of dependence is presented.

Data Science in a Time of Crisis: Lessons from the Pandemic

The exceptional shock of the COVID-19 pandemic has brought about an equally exceptional scientific response, over a wide range of disciplines and with a spirit of collaboration and mutual support. ©



Viral genomes reveal patterns of the SARS-CoV-2 outbreak in Washington State

How the spread of SARS-CoV-2 in Washington State (USA) was shaped by differences in timing of mitigation strategies across counties, as well as by repeated introductions of viral lineages into the state is characterized.

Tracking the COVID-19 pandemic in Australia using genomics

The application of genomics to rapidly identify SARS-CoV-2 transmission chains will become critically important as social restrictions ease globally and public health responses to emergent cases must be swift, highly focused and effective.

A Phylodynamic Workflow to Rapidly Gain Insights into the Dispersal History and Dynamics of SARS-CoV-2 Lineages

An analytical pipeline is described and applied that is a compromise between fast and rigorous analytical steps and has the potential to be quickly applied to other countries or regions, with key benefits in complementing epidemiological analyses in assessing the impact of intervention measures or their progressive easement.

Stability of SARS-CoV-2 phylogenies

It is found that some recurrent mutations in reported SARS-CoV-2 genome sequences have been observed predominantly or exclusively by single labs, co-localize with commonly used primer binding sites and are more likely to affect the protein coding sequences than other similarly recurrent mutations.

Genomic surveillance reveals multiple introductions of SARS-CoV-2 into Northern California

The genomic epidemiology of SARS-CoV-2 in Northern California from late January to mid-March 2020 is investigated, using samples from 36 patients spanning nine counties and the Grand Princess cruise ship to support contact tracing, social distancing, and travel restrictions to contain the spread of the virus.

Sequencing identifies multiple early introductions of SARS-CoV-2 to the New York City Region

Analysis of 864 SARS-CoV-2 sequences from cases in the New York City metropolitan area during the COVID-19 outbreak in Spring 2020 showed that early transmission was most linked to cases from Europe.

Genomic epidemiology of a densely sampled COVID-19 outbreak in China

An analysis of 20 whole SARS-CoV 2 genomes from a single relatively small and geographically constrained outbreak in Weifang, People's Republic of China finds that these estimates are consistent with reported cases and there is unlikely to be a large undiagnosed burden of infection over the period the authors studied.

Full genome viral sequences inform patterns of SARS-CoV-2 spread into and within Israel

The findings underscore the ability of this virus to efficiently transmit between and within countries, as well as demonstrate the effectiveness of social distancing measures for reducing its spread.

Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic

Estimates are obtained from three approaches that the most likely divergence date of SARS-CoV-2 from its most closely related available bat sequences ranges from 1948 to 1982, indicating that there are high levels of co-infection in horseshoe bats and that the viral pool can generate novel allele combinations and substantial genetic diversity.

The origin and early spread of SARS-CoV-2 in Europe

A view on the early state of the epidemic in Europe and on migration patterns of the virus before border closures is offered and it is found that before the first border closures in Europe, the rate of new cases occurring from within-country transmission was within or exceeded the estimated bounds on the rates of new migration cases.