Corpus ID: 45836033

Sensitive Long-Indel-Aware Alignment of Sequencing Reads

  title={Sensitive Long-Indel-Aware Alignment of Sequencing Reads},
  author={T. Marschall and A. Schonhuth},
  journal={arXiv: Genomics},
The tremdendous advances in high-throughput sequencing technologies have made population-scale sequencing as performed in the 1000 Genomes project and the Genome of the Netherlands project possible. Next-generation sequencing has allowed genom-wide discovery of variations beyond single-nucleotide polymorphisms (SNPs), in particular of structural variations (SVs) like deletions, insertions, duplications, translocations, inversions, and even more complex rearrangements. Here, we design a read… Expand

Figures and Tables from this paper

Genotyping of Inversions and Tandem Duplications
A novel statistical approach, called DIGTYPER (Duplication and Inversion GenoTYPER), which computes genotype likelihoods for a given inversion or duplication and reports the maximum likelihood genotype. Expand
Genome analysis Genotyping inversions and tandem duplications
Motivation: Next Generation Sequencing (NGS) has enabled studying structural genomic variants (SVs) such as duplications and inversions in large cohorts. SVs have been shown to play important rolesExpand
Genotyping inversions and tandem duplications
A novel statistical approach, called DIGTYPER (Duplication and Inversion GenoTYPER), which computes genotype likelihoods for a given inversion or duplication and reports the maximum likelihood genotype. Expand
Detecting Horizontal Gene Transfer by Mapping Sequencing Reads Across Species Boundaries
Daisy is presented, a novel mapping-based tool for HGT detection directly from NGS data that can successfully detect HGT regions with base pair resolution in both simulated and real data, and outperforms alternative approaches using a genome assembly of the reads. Expand
Detecting horizontal gene transfer by mapping sequencing reads across species boundaries
MOTIVATION Horizontal gene transfer (HGT) is a fundamental mechanism that enables organisms such as bacteria to directly transfer genetic material between distant species. This way, bacteria canExpand
Discovering and Genotyping Twilight Zone Deletions
This chapter presents a novel maximum likelihood approach for genotyping deletions which achieves highly favorable performance rates on twilight zone indels and evaluates a comprehensive selection of state-of-the-art tools on next-generation sequencing (NGS) reads from a genome containing real variants. Expand
MATE-CLEVER: Mendelian-inheritance-aware discovery and genotyping of midsize and long indels
MATE-CLEVER (Mendelian-inheritance-AtTEntive CLique-Enumerating Variant findER) is presented as an approach that accurately discovers and genotypes indels longer than 30 bp from contemporary NGS reads with a special focus on family data. Expand
Enhancing sensitivity and controlling false discovery rate in somatic indel discovery using a latent variable model
A latent variable model that can take the major confounding factors and uncertainties into a unifying account for somatic indel discovery is presented and an intuitive and effective way to control the false discovery rate is presented. Expand
Detailed MATE-CLEVER Pipeline for GoNL
For deletion discovery, we ran the discovery part of MATE-CLEVER [3], with minor modifications that account for volatilities among library protocols. MATE-CLEVER is an integrated approach. Its majorExpand


Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes.
Combinatorial formulations for the SV detection between a reference genome sequence and a next-gen-based, paired-end, whole genome shotgun-sequenced individual are given and they all turn out to be fast and quite reliable. Expand
Detecting genomic indel variants with exact breakpoints in single- and paired-end sequencing data using SplazerS
This work presents a method for 'split' read mapping, where prefix and suffix match of a read may be interrupted by a longer gap in the read-to-reference alignment, and shows that SplazerS is a versatile tool applicable to unanchored or single-end as well as anchored paired-end reads. Expand
Computational methods for discovering structural variation with next-generation sequencing
A new generation of methods are being developed to tackle the challenges of short reads, while taking advantage of the high coverage the new sequencing technologies provide. Expand
Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads.
A read mapper, Stampy, which uses a hybrid mapping algorithm and a detailed statistical model to achieve both speed and sensitivity, particularly when reads include sequence variation, which results in a higher useable sequence yield and improved accuracy compared to that of existing software. Expand
CLEVER: clique-enumerating variant finder
A novel internal segment size based approach is presented, which organizes all, including concordant, reads into a read alignment graph, where max-cliques represent maximal contradiction-free groups of alignments and statistically evaluates them for their potential to reflect insertions or deletions. Expand
DELLY: structural variant discovery by integrated paired-end and split-read analysis
An SV discovery method that integrates short insert paired-ends, long-range mate-pairs and split-read alignments to accurately delineate genomic rearrangements at single-nucleotide resolution, called DELLY, which enables to ascertain the full spectrum of genomic rearrANGements, including complex events. Expand
Assemblathon 1: a competitive assessment of de novo short read assembly methods.
The Assemblathon 1 competition is described, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies, and it is established that it is possible to assemble the genome to a high level of coverage and accuracy. Expand
Personalized Copy-Number and Segmental Duplication Maps using Next-Generation Sequencing
An algorithm (mrFAST) is presented to comprehensively map next-generation sequence reads, which allows for the prediction of absolute copy-number variation of duplicated segments and genes, and can distinguish between different copies of highly identical genes. Expand
Fast and accurate short read alignment with Burrows–Wheeler transform
Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. Expand
An initial map of insertion and deletion (INDEL) variation in the human genome.
An initial map of human INDEL variation that contains 415,436 unique INDEL polymorphisms, which range from 1 bp to 9989 bp in length and are split almost equally between insertions and deletions, relative to the chimpanzee genome sequence. Expand