New method to reconstruct phylogenetic and transmission trees with sequence data from infectious disease outbreaks


14 Whole-genome sequencing (WGS) of pathogens from host samples becomes more and 15 more routine during infectious disease outbreaks. These data provide information on possible 16 transmission events which can be used for further epidemiologic analyses, such as identification 17 of risk factors for infectivity and transmission. However, the relationship between transmission 18 events and WGS data is obscured by uncertainty arising from four largely unobserved processes: 19 transmission, case observation, within-host pathogen dynamics and mutation. To properly 20 resolve transmission events, these processes need to be taken into account. Recent years have 21 seen much progress in theory and method development, but applications are tailored to specific 22 datasets with matching model assumptions and code, or otherwise make simplifying assumptions 23 that break up the dependency between the four processes. To obtain a method with wider 24 applicability, we have developed a novel approach to reconstruct transmission trees with WGS 25 data. Our approach combines elementary models for transmission, case observation, within-host 26 pathogen dynamics, and mutation. We use Bayesian inference with MCMC for which we have 27 designed novel proposal steps to efficiently traverse the posterior distribution, taking account of 28 all unobserved processes at once. This allows for efficient sampling of transmission trees from 29 the posterior distribution, and robust estimation of consensus transmission trees. We 30 implemented the proposed method in a new R package phybreak. The method performs well in 31 tests of both new and published simulated data. We apply the model to to five datasets on 32 densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. 33 Using only sampling times and sequences as data, our analyses confirmed the original results or 34 improved on them: the more realistic infection times place more confidence in the inferred 35 transmission trees. 36 peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. 3 37 Author Summary 38 It is becoming easier and cheaper to obtain whole genome sequences of pathogen 39 samples during outbreaks of infectious diseases. If all hosts during an outbreak are sampled, and 40 these samples are sequenced, the small differences between the sequences (single nucleotide 41 polymorphisms, SNPs) give information on the transmission tree, i.e. who infected whom, and 42 when. However, correctly inferring this tree is not straightforward, because SNPs arise from 43 unobserved processes including infection events, as well as pathogen growth and …

12 Figures and Tables

Cite this paper

@inproceedings{Klinkenberg2016NewMT, title={New method to reconstruct phylogenetic and transmission trees with sequence data from infectious disease outbreaks}, author={Don Klinkenberg and Jantien A Backer and Xavier Didelot and Caroline Colijn and Jacco Wallinga}, year={2016} }