Ordered index seed algorithm for intensive DNA sequence comparison

  title={Ordered index seed algorithm for intensive DNA sequence comparison},
  author={Dominique Lavenier},
  journal={2008 IEEE International Symposium on Parallel and Distributed Processing},
  • D. Lavenier
  • Published 14 April 2008
  • Computer Science
  • 2008 IEEE International Symposium on Parallel and Distributed Processing
This paper presents a seed-based algorithm for intensive DNA sequence comparison. The novelty comes from the way seeds are used to efficiently generate small ungapped alignments - or HSPs (high scoring pairs) - in the first stage of the search. W-nt words are first indexed and all the Aw possible seeds are enumerated following a strict order ensuring fast generation of unique HSPs. A prototype - written in C - has been realized and tested on large DNA banks. Speed-up compared to BLASTN range… Expand

Figures and Topics from this paper

Parallelize ORIS Algorithm for DNA Comparison
Ordered Indexed Seed Algorithm is a new published DNA pair-wise comparison algorithm, which can improve the DNA alignment performance dramatically. Its design structure also has potential ability toExpand
Using Binary Decision Diagrams (BDDs) for Memory Optimization in Basic Local Alignment Search Tool (BLAST)
A BDD-based version of BLAST is developed, which omits any redundant information shared by the aligned sequences, and has observed a considerable improvement on memory usage, saving up to 63,95% memory, with a negligible performance degradation of only 3,10%. Expand
KLAST: fast and sensitive software to compare large genomic databanks on cloud
KLAST is a sequence comparison software optimized to compare two nucleotides or proteins data sets, typically a set of query sequences and a reference bank, and a Hadoop version has been designed to scale up to NGS data processing. Expand
Parallelization on graphic hardware : contributions to RNA folding and sequence alignment. (Parallélisation sur matériel graphique : contributions au repliement d'ARN et à l'alignement de séquences)
The main contribution is the development of a new algorithm filtering candidate alignment locations quickly, based on the pre computation of tiles of the dynamic programming matrix, which proved to be in fact more effective on a sequential CPU program and lead to an efficient new CPU aligner. Expand
Pathways in Bioinformatics: A Window in Computer Science
This paper looks at Bioinformatics from the perspectives of Computer Science, and provides entry point and pathways for more active and productive participation. Expand


YASS: enhancing the sensitivity of DNA similarity search
YASS applies transition-constrained seeds to specify the most probable conserved motifs between homologous sequences, combined with a flexible hit criterion used to identify groups of seeds that are likely to exhibit significant alignments. Expand
Protein Similarity Search with Subset Seeds on a Dedicated Reconfigurable Hardware
This is the first attempt to exploit efficient seed-based algorithms for parallelizing the sequence similarity search on a parallel specialized hardware embedding reconfigurable architecture (FPGA), where the FPGA is tightly connected to large capacity Flash memories. Expand
FLASH: a fast look-up algorithm for string homology
  • A. Califano, I. Rigoutsos
  • Computer Science, Medicine
  • Proceedings of IEEE Conference on Computer Vision and Pattern Recognition
  • 1993
The algorithm presented is based on a probabilistic indexing framework which requires minimal access to the database for each match, and is shown to scale well to databases containing billions of nucleotides with performances that are orders of magnitude better than the fastest of the current techniques. Expand
Improved tools for biological sequence comparison.
  • W. Pearson, D. Lipman
  • Biology, Medicine
  • Proceedings of the National Academy of Sciences of the United States of America
  • 1988
Three computer programs for comparisons of protein and DNA sequences can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity. Expand
Basic local alignment search tool.
A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP)Expand
BLAT--the BLAST-like alignment tool.
How BLAT was optimized is described, which is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences. Expand
Optimizing Multiple Spaced Seeds for Homology Search
This work describes a linear programming (LP)-based algorithm to optimize a set of seeds and offers a performance guarantee: the sensitivity of a chosen seed set is at least 70% of what can be achieved, in most reasonable models of homologous sequences. Expand
PatternHunter: faster and more sensitive homology search
A new homology search algorithm 'PatternHunter' is presented that uses a novel seed model for increased sensitivity and new hit-processing techniques for significantly increased speed. Expand
An improved algorithm for matching biological sequences.
  • O. Gotoh
  • Biology, Medicine
  • Journal of molecular biology
  • 1982
Abstract The algorithm of Waterman et al. (1976) for matching biological sequences was modified under some limitations to be accomplished in essentially MN steps, instead of the M 2 N steps necessaryExpand
A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins
A computer adaptable method for finding similarities in the amino acid sequences of two proteins has been developed. From these findings it is possible to determine whether significant homologyExpand