Sensitive and fast mapping of di-base encoded reads

@article{Hormozdiari2011SensitiveAF,
  title={Sensitive and fast mapping of di-base encoded reads},
  author={Farhad Hormozdiari and Faraz Hach and S{\"u}leyman Cenk Sahinalp and Evan E. Eichler and Can Alkan},
  journal={Bioinform.},
  year={2011},
  volume={28},
  pages={150}
}
Previously, we have used ‘–seed S20 -k 10000 -v 4’. With this update, PerM now achieves full sensitivity in our simulation experiment. With real datasets (Table 6), PerM tends to map more reads compared with Bowtie, but maps slightly less than Mapreads and SOCS. We would like to apologize for the previous parameter sets we used for PerM, due to our misinterpretation of its documentation. We now update the relevant rows in Tables 3 and 6 as follows. Table 3. Performance of PerM with simulated… 

Figures and Tables from this paper

Accelerating read mapping with FastHASH
TLDR
A new algorithm, FastHASH, is proposed, which drastically improves the performance of the seed-and-extend type hash table based read mapping algorithms, while maintaining the high sensitivity and comprehensiveness of such methods.
The effects of sampling on the efficiency and accuracy of k−mer indexes: Theoretical and empirical comparisons using the human genome
TLDR
It is found that soft sampling significantly reduces both index size and query time with relatively small losses in query accuracy when identifying HSLAs, and a new model for sampling with BLAST is provided that predicts empirical retention rates with reasonable accuracy by modeling two key problem factors.
Boosting high throughput sequencing data compression algorithms using reordering
TLDR
SCALCE is presented, a “boosting” scheme based on Locally Consistent Parsing technique which reorganizes the reads in a way that results in a higher compression speed and compression rate, independent of the compression algorithm in use and without using a reference genome.
mrsFAST-Ultra: a compact, SNP-aware mapper for high performance sequencing applications
High throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for processing and downstream analysis. While tools that report the ‘best’ mapping location
GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies
TLDR
This work proposes a novel seed location filtering algorithm, GRIM-Filter, optimized to exploit 3D-stacked memory systems that integrate computation within a logic layer stacked under memory layers, to perform processing-in-memory (PIM).
Structural Variant Calling
TLDR
This work used whole-genome shotgun paired-end sequence data generated with both Illumina and Applied Biosystems SOLiD platforms from the genomes of six canid samples to estimate the fraction of the genome with segmental duplications.
ALPHA: A Novel Algorithm-Hardware Co-design for Accelerating DNA Seed Location Filtering
TLDR
An algorithm-hardware co-design is proposed that exploits the data-reuse in the seed location filtering operation and, compared to the GRIM-Filter, cuts the number of memory accesses by 22-54%, which improves the overall performance and energy consumption.
Comparing fixed sampling with minimizer sampling when using k-mer indexes to find maximal exact matches
TLDR
It is argued that for any application where each shared k-mer occurrence must be processed, fixed sampling is the right sampling method.
Novel computational techniques for mapping and classifying Next-Generation Sequencing data. (Nouvelles techniques informatiques pour la localisation et la classification de données de séquençage haut débit)
TLDR
This thesis presents novel computational techniques for read mapping and taxonomic classification of NGS reads and provides the first comprehensive overview of this method and demonstrates its qualities using Dynamic Mapping Simulator, a pipeline that compares various dynamic mapping scenarios to static mapping and iterative referencing.
...
...

References

SHOWING 1-10 OF 38 REFERENCES
mrsFAST: a cache-oblivious algorithm for short-read mapping
TLDR
In almost all recent structural variation discovery studies, short reads from a donor genome have been mapped to a reference genome as a first step, and the accuracy of such an SVD study is directly correlated to this mapping step, which also provides the main computational bottleneck of theSVD study.
PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds
TLDR
The mapping software, named PerM (Periodic Seed Mapping) is presented that uses periodic spaced seeds to significantly improve mapping efficiency for large reference genomes when compared with state-of-the-art programs.
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
TLDR
Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.
BFAST: An Alignment Tool for Large Scale Genome Resequencing
TLDR
It is shown BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods.
Fast and accurate short read alignment with Burrows–Wheeler transform
TLDR
Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
SOAP: short oligonucleotide alignment program
TLDR
The program SOAP is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology, which supports multi-threaded parallel computing and has a batch module for multiple query sets.
Detection and characterization of novel sequence insertions using paired-end next-generation sequencing
TLDR
The NovelSeq framework can be built as part of a general sequence analysis pipeline to discover multiple types of genetic variation (SNPs, structural variation, etc.), thus it requires significantly less-computational resources than de novo sequence assembly.
Mapping short DNA sequencing reads and calling variants using mapping quality scores.
TLDR
This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.
SHRiMP: Accurate Mapping of Short Color-space Reads
TLDR
It is demonstrated that SHRiMP can accurately map reads to this highly polymorphic genome, while confirming high heterozygosity of C. savignyi in this second individual.
Technology-specific error signatures in the 1000 Genomes Project data
TLDR
It is highlighted that different NGS platforms suit different practical applications differently well, and that NGS-based studies require stringent data quality control for their results to be valid, while the use of multiple N GS platforms may be more cost-efficient than relying upon a single technology alone.
...
...