Learn More
MOTIVATION The Illumina paired-end sequencing technology can generate reads from both ends of target DNA fragments, which can subsequently be merged to increase the overall read length. There already exist tools for merging these paired-end reads when the target fragments are equally long. However, when fragment lengths vary and, in particular, when either(More)
Insects are the most speciose group of animals, but the phylogenetic relationships of many major lineages remain unresolved. We inferred the phylogeny of insects from 1478 protein-coding genes. Phylogenomic analyses of nucleotide and amino acid sequences, with site-specific nucleotide or domain-specific amino acid substitution models, produced statistically(More)
BACKGROUND VSEARCH is an open source and free of charge multithreaded 64-bit tool for processing and preparing metagenomics, genomics and population genomics nucleotide sequence data. It is designed as an alternative to the widely used USEARCH tool (Edgar, 2010) for which the source code is not publicly available, algorithm details are only rudimentarily(More)
We present a substantially improved and parallelized version of DPPDiv, a software tool for estimating species divergence times and lineage-specific substitution rates on a fixed tree topology. The improvement is achieved by integrating the DPPDiv code with the Phylogenetic Likelihood Library (PLL), a fast, optimized, and parallelized collection of(More)
The constant advances in sequencing technology have redefined the way genome sequencing is performed. They are able to produce millions of short sequences (reads) during a single experiment, and with a much lower cost than previously possible. Due to the dramatic increase in the amount of data generated, efficient algorithms for aligning (mapping) these(More)
Motivation In recent years, molecular species delimitation has become a routine approach for quantifying and classifying biodiversity. Barcoding methods are of particular importance in large-scale surveys as they promote fast species discovery and biodiversity estimates. Among those, distance-based methods are the most common choice as they scale well with(More)
The phylogenetic likelihood function (PLF) is the major computational bottleneck in several applications of evolutionary biology such as phylogenetic inference, species delimitation, model selection, and divergence times estimation. Given the alignment, a tree and the evolutionary model parameters, the likelihood function computes the conditional likelihood(More)
Tong et al. comment on the accuracy of the dating analysis presented in our work on the phylogeny of insects and provide a reanalysis of our data. They replace log-normal priors with uniform priors and add a "roachoid" fossil as a calibration point. Although the reanalysis provides an interesting alternative viewpoint, we maintain that our choices were(More)
We introduce the Phylogenetic Likelihood Library (PLL), a highly optimized application programming interface for developing likelihood-based phylogenetic inference and postanalysis software. The PLL implements appropriate data structures and functions that allow users to quickly implement common, error-prone, and labor-intensive tasks, such as likelihood(More)
The longest common substring with k-mismatches problem is to find, given two strings S1 and S2, a longest substring A1 of S1 and A2 of S2 such that the Hamming distance between A1 and A2 is ≤ k. We introduce a practical O(nm) time and O(1) space solution for this problem, where n and m are the length of S1 and S2, respectively. This algorithm can also be(More)