Gaussian process test for high-throughput sequencing time series: application to experimental evolution

  title={Gaussian process test for high-throughput sequencing time series: application to experimental evolution},
  author={Hande Topa and {\'A}gnes J{\'o}n{\'a}s and Robert Kofler and Carolin Kosiol and Antti Honkela},
  pages={1762 - 1770}
Motivation: Recent advances in high-throughput sequencing (HTS) have made it possible to monitor genomes in great detail. New experiments not only use HTS to measure genomic features at one time point but also monitor them changing over time with the aim of identifying significant changes in their abundance. In population genetics, for example, allele frequencies are monitored over time to detect significant frequency changes that indicate selection pressures. Previous attempts at analyzing… 

Figures and Tables from this paper

GPrank: an R package for detecting dynamic elements from genome-wide time series

The GPrank R package for modelling genome-wide time series is presented, incorporating variance information obtained during pre-processing of the HTS data using probabilistic quantification methods or from a beta-binomial model using sequencing depth to avoid false positives.

Quantifying Selection with Pool-Seq Time Series Data

This work introduces a highly accurate method to estimate selection parameters from replicated time series data, which is fast enough to be applied on a genome scale and shows that the effective population size and the number of replicates have the largest impact.

Multi-locus Analysis of Genomic Time Series Data from Experimental Evolution

A Gaussian process approximation to the multi-locus Wright-Fisher process with selection over a time course of tens of generations is developed, demonstrating the power of this method to correctly detect, locate and estimate the fitness of a selected allele from among several linked sites.

Inferring population genetics parameters of evolving viruses using time-series data

A computational framework that allows inference of either the fitness of a mutation, the mutation rate or the population size from genomic time-series sequencing data, and is able to categorize a mutation as Advantageous, Neutral or Deleterious.

Clear: Composition of Likelihoods for Evolve and Resequence Experiments

This article proposes a method—composition of likelihoods for evolve-and-resequence experiments (Clear)—to identify signatures of selection in small population E&R experiments, and applied the Clear statistic to multiple E&r experiments, including data from a study of adaptation of Drosophila melanogaster to alternating temperatures and aStudy of outcrossing yeast populations.

Bait-ER: a Bayesian method to detect targets of selection in Evolve-and-Resequence experiments

Bait-ER is presented – a fully Bayesian approach based on the Moran model of allele evolution to estimate selection coefficients from E&R experiments that avoids the computational burden of simulating an empirical null distribution, outperforming available software in terms of computational time and facilitating its use on genome-wide data.

Combining experimental evolution with next-generation sequencing: a powerful tool to study adaptation from standing genetic variation

It is shown that E&R can be powerful enough to identify causative genes and possibly even single-nucleotide polymorphisms and how the experimental design and the complexity of the trait could result in a large number of false positive candidates.

Inferring Fitness Effects from Time-Resolved Sequence Data with a Delay-Deterministic Model

A case in which a deterministic model of evolution may give highly misleading inferences, resulting from the nondeterministic properties of mutation in a finite population is noted, and an alternative approach is proposed that acts to correct for this error, and which is denote the delay-deterministic model.

Estimation of population genetic parameters using an EM algorithm and sequence data from experimental evolution populations

A novel method for estimating WF parameters (EMWER) is developed, by applying an expectation maximization algorithm to the Kolmogorov forward equation associated with the WF model diffusion approximation, which was used to infer the effective population size, selection coefficients and dominance parameters from E&R data.

A delay-deterministic model for inferring fitness effects from time-resolved genome sequence data

An alternative approach is proposed which corrects for a case in which a deterministic model of evolution may give highly misleading inferences, resulting from the non-deterministic properties of mutation in a finite population, which it is suggested can be easily identified via the use of a regular deterministic models.



Quantifying Selection Acting on a Complex Trait Using Allele Frequency Time Series Data

A population genetic method to analyze time series data of allele frequencies from an experiment to discover that about 6% of polymorphic sites evolve nonneutrally under heat stress conditions, either because of their linkage to beneficial (driver) alleles or because they are drivers themselves.

Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters

The hierarchical Gaussian process model provides an excellent statistical basis for several gene-expression time-series tasks, and has only a few additional parameters over a regular GP, has negligible additional complexity, is easily implemented and can be integrated into several existing algorithms.

Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements

A generative model-based Bayesian hierarchical clustering algorithm for microarray time series that employs Gaussian process regression to capture the structure of the data and highlights the importance of including replicate information, which is found enables the discrimination of additional distinct expression profiles.

Estimating replicate time shifts using Gaussian process regression

A statistical approach is developed that simultaneously infers both the underlying (hidden) expression profile for each gene, as well as the biological time for each individual replicate, based on Gaussian process regression (GPR) combined with a probabilistic model that accounts for uncertainty about the biological development time of each replicate.

The Power to Detect Quantitative Trait Loci Using Resequenced, Experimentally Evolved Populations of Diploid, Sexual Organisms

Forward-in-time population genetics simulations of 1 Mb genomic regions under a large combination of experimental conditions indicate that the ability to detect differentiation between populations is primarily affected by selection coefficient, population size, number of replicate populations, and number of founding haplotypes.

What paths do advantageous alleles take during short‐term evolutionary change?

Estimating genome‐wide allele frequencies at the start, 15 generations into and at the end of a 37‐generation Drosophila experimental evolution study and identifying regions of the genome that have responded to laboratory selection to describe the temporal dynamics of allele frequency change are taken.

A Guide for the Design of Evolve and Resequencing Studies

Computer simulations suggest that, with an adequate experimental design, E&R studies are a powerful tool to identify adaptive mutations from standing genetic variation and thereby provide an excellent means to analyze the trajectories of selected alleles in evolving populations.

A Robust Bayesian Two-Sample Test for Detecting Intervals of Differential Gene Expression in Microarray Time Series

A two-sample test for identifying intervals of differential gene expression in microarray time series based on Gaussian process regression, which can deal with arbitrary numbers of replicates, and is robust with respect to outliers is proposed.

Genome-wide analysis of a long-term evolution experiment with Drosophila

Experimental evolution systems allow the genomic study of adaptation, and so far this has been done primarily in asexual systems with small genomes, such as bacteria and yeast. Here we present

Pervasive Genetic Hitchhiking and Clonal Interference in 40 Evolving Yeast Populations

P pervasive genetic hitchhiking is found: multiple mutations arise and move synchronously through the population as mutational ‘cohorts’, and patterns of sequence evolution are driven by a balance between these chance effects of hitchh hiking and interference.