Rapid and sensitive dot-matrix methods for genome analysis
@article{Huang2004RapidAS,
title={Rapid and sensitive dot-matrix methods for genome analysis},
author={Yue Huang and Ling Zhang},
journal={Bioinformatics},
year={2004},
volume={20 4},
pages={
460-6
}
}MOTIVATION
Dot-matrix plots are widely used for similarity analysis of biological sequences. Many algorithms and computer software tools have been developed for this purpose. Though some of these tools have been reported to handle sequences of a few 100 kb, analysis of genome sequences with a length of >10 Mb on a microcomputer is still impractical due to long execution time and computer memory requirement.
RESULTS
Two dot-matrix comparison methods have been developed for analysis of large…
45 Citations
Breaking the computational barriers of pairwise genome comparison
- Biology, Computer ScienceBMC Bioinformatics
- 2015
This work has addressed the problem of pairwise and all-versus-all comparison of large sequences in general, greatly increasing the limits on input data size with a modular out-of-core strategy that uses secondary storage to avoid reaching memory limits during the identification of High-scoring Segment Pairs between the sequences under comparison.
A computational tool for the genomic identification of regions of unusual compositional properties and its utilization in the detection of horizontally transferred sequences.
- BiologyMolecular biology and evolution
- 2006
This work uses S-plot to identify regions that may have originated through horizontal gene transfer through a 2-step approach, and analysis of RUCPs in O157:H7 infer that there were at least 53 sources of horizontally transferred sequences.
MAGMA: An Algorithm for Mining Multi-level Patterns in Genomic Data
- Biology2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007)
- 2007
The proposed algorithm, MGC, is able to discover the similarities and dissimilarities among different genomes, while in addition, to confirm the specific role of the gene in the genomes and provide variations among species and similarity within species.
A Pattern Matching Technique for Multiple Sequences Alignment with GAP Consideration
- Computer Science2009 International Conference on Signal Acquisition and Processing
- 2009
An efficient recursive approach is proposed in this paper that would not only find the multiple sequences Alignment for protein/DNA sequence but also provides means for consideration of gaps between them.
Comparative genome sequence analysis by efficient pattern matching technique
- Computer Science
- 2008
This paper is presenting an algorithm that provides approximate comparative match between any input strands that will overcome the draw backs and short comings in prevailing techniques.
Comparison of Genomes As 2-Level Pattern Analysis
- Biology
- 2006
Results generally supported the hypothesis that genes of the cell envelope provide the variable loci required to form an effective source of variation among species that are otherwise very similar.
Gepard: a rapid and sensitive tool for creating dotplots on genome scale
- Computer ScienceBioinform.
- 2007
UNLABELLED
Gepard provides a user-friendly, interactive application for the quick creation of dotplots. It utilizes suffix arrays to reduce the time complexity of dotplot calculation to Theta(m*log…
OxfordGrid: a web interface for pairwise comparative map views
- Computer ScienceBioinform.
- 2005
OxfordGrid is a web application and database schema for storing and interactively displaying genetic map data in a comparative, dot-plot, fashion that represents a pairwise comparison of mapped probe data for two linkage groups or chromosomes.
Recurrent DNA copy number variation in the laboratory mouse
- BiologyNature Genetics
- 2007
A genome-wide analysis of spontaneous copy number variation (CNV) in the laboratory mouse finds that many CNVs arise through a highly nonrandom process, and rates of change varied roughly four orders of magnitude across different loci.
References
SHOWING 1-10 OF 25 REFERENCES
A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis.
- Computer ScienceGene
- 1995
A fast word search algorithm for the representation of sequence similarity in genomic DNA.
- Computer ScienceNucleic acids research
- 1994
An improvement is proposed through the preprocessing of the data into an automation recognizing the word structure of a sequence to systematically eliminate the repetitions during word comparison.
Rapid similarity searches of nucleic acid and protein data banks.
- Biology, Computer ScienceProceedings of the National Academy of Sciences of the United States of America
- 1983
An algorithm for the global comparison of sequences based on matching k-tuples of sequence elements for a fixed k results in substantial reduction in the time required to search a data bank when compared with prior techniques of similarity analysis, with minimal loss in sensitivity.
Improving the efficiency of dot-matrix similarity searches through use of an oligomer table
- BiologyNucleic Acids Res.
- 1986
The algorithm described finds similarities between two sequences of lengths M and N, comparing L residues at a time, with an efficiency of L X M X N/(SK) where S is the alphabet size, and k is the length of the oligomer.
Rapid and sensitive protein similarity searches.
- BiologyScience
- 1985
An algorithm was developed which facilitates the search for similarities between newly determined amino acid sequences and sequences already available in databases and increases sensitivity by giving high scores to those amino acid replacements which occur frequently in evolution.
Enhanced graphic matrix analysis of nucleic acid and protein sequences.
- BiologyProceedings of the National Academy of Sciences of the United States of America
- 1981
Computer translation of nucleic acid sequences into all possible amino acid sequences followed by graphic matrix analysis provides a way to detect the most likely protein encoding regions and can predict the correct reading frames in sequences in which splicing patterns are not defined.
An interactive graphics program for comparing and aligning nucleic acid and amino acid sequences.
- Computer ScienceNucleic acids research
- 1982
A computer program designed to look for similarities between pairs of nucleic or amino acid sequences and can use inbuilt editing functions to make insertions to produce alignments of the two sequences.
Efficient algorithms for folding and comparing nucleic acid sequences
- BiologyNucleic Acids Res.
- 1982
The homology and secondary structure programs are respectively illustrated with a comparison of two phage genomes, and a discussion of Drosophila melanogaster 55 RNA folding.
Similarity and Homology
- Biology
- 1991
The volume of molecular sequence data has long since surpassed human information processing capacity for even simple tasks such as searching for related sequences, and with the ever increasing rate at which new sequences are being produced, the need for computer-assisted analysis becomes more and more acute.

