Learn More
MOTIVATION During the past years, next-generation sequencing has become a key technology for many applications in the biomedical sciences. Throughput continues to increase and new protocols provide longer reads than currently available. In almost all applications, read mapping is a first step. Hence, it is crucial to have algorithms and implementations that(More)
BACKGROUND Second generation sequencing technologies yield DNA sequence data at ultra high-throughput. Common to most biological applications is a mapping of the reads to an almost identical or highly similar reference genome. The assessment of the quality of read mapping results is not straightforward and has not been formalized so far. Hence, it has not(More)
MOTIVATION Automatic error correction of high-throughput sequencing data can have a dramatic impact on the amount of usable base pairs and their quality. It has been shown that the performance of tasks such as de novo genome assembly and SNP calling can be dramatically improved after read error correction. While a large number of methods specialized for(More)
We present a simple randomized data structure for two-dimensional point sets that allows fast nearest neighbor queries in many cases. An implementation outperforms several previous implementations for commonly used benchmarks.)} of n points in two-dimensional Euclidean space, we want to build a linear space data structure that can answer nearest neighbor(More)
Exomiser is an application that prioritizes genes and variants in next-generation sequencing (NGS) projects for novel disease-gene discovery or differential diagnostics of Mendelian disease. Exomiser comprises a suite of algorithms for prioritizing exome sequences using random-walk analysis of protein interaction networks, clinical relevance and(More)
We describe an approach to parallel graph partitioning that scales to hundreds of processors and produces a high solution quality. For example, for many instances from Walshaw's benchmark collection we improve the best known partitioning. We use the well known framework of multi-level graph partitioning. All components are implemented by scalable parallel(More)
Recent advances in rapid, low-cost sequencing have opened up the opportunity to study complete genome sequences. The computational approach of multiple genome alignment allows investigation of evolutionarily related genomes in an integrated fashion, providing a basis for downstream analyses such as rearrangement studies and phylogenetic inference. Graphs(More)
In this section, we describe some more involved details of the benchmark. We note that each match with distance ≤ k − 2 implies at least one match on both sides of it. Figure 1 in the main article shows an example. This can also be seen in Figure 2 in the main article. For k = 5, the third end position of the third lower branch in the left tree implies(More)
MOTIVATION Large insertions of novel sequence are an important type of structural variants. Previous studies used traditional de novo assemblers for assembling non-mapping high-throughput sequencing (HTS) or capillary reads and then tried to anchor them in the reference using paired read information. RESULTS We present approaches for detecting insertion(More)
Highway-Node Routing is a scheme for solving the shortest path problem showing excellent speedups over Dijkstra's algorithm. There also is a dynamic variant that allows changes to the cost function. Based on an existing, sequential implementation, we present a parallel version of the precomputation required for Highway-Node Routing. We also present(More)