Testing statistical significance scores of sequence comparison methods with structure similarity
@article{Hulsen2006TestingSS, title={Testing statistical significance scores of sequence comparison methods with structure similarity}, author={Tim Hulsen and Jacob de Vlieg and Jack A. M. Leunissen and Peter M. A. Groenen}, journal={BMC Bioinformatics}, year={2006}, volume={7}, pages={444 - 444} }
BackgroundIn the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been…
19 Citations
TULIP software and web server : automatic classification of protein sequences based on pairwise comparisons and Z-value statistics
- Computer Science
- 2009
A web server is developed allowing the local or online computation of TULIP trees based on the CSHP probabilities, and allows a classification of protein sequences based on pairwise alignments and following evolutionary assumptions.
Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores
- BiologyBMC Bioinformatics
- 2007
A model of evolution of sequences based on aging, as meant in Reliability Theory, using the fact that the amount of information shared between an initial sequence and the sequences in its lineage is a decreasing function of time is built, using a sequence alignment score.
Normalized global alignment for protein sequences.
- Computer ScienceJournal of theoretical biology
- 2011
Island method for estimating the statistical significance of profile-profile alignment scores
- BiologyBMC Bioinformatics
- 2008
The island statistics can be generalized to profile-profile alignments to provide an efficient method for the alignment score normalization and has a clear speed advantage over the direct shuffling method for comparable accuracy in parameter estimates.
Algorithms in comparative genomics
- Biology
- 2010
The author has studied and established a simple prescription for obtaining a better phylogeny by improving the underlying alignments used in phylogeny reconstruction by improving upon Gotoh's iterative heuristic by iterating with maximum parsimony guide-trees.
Searching for evolutionary distant RNA homologs within genomic sequences using partition function posterior probabilities
- BiologyBMC Bioinformatics
- 2007
It is demonstrated, for the first time, that partition function match probabilities used for expected accuracy alignment, as done in Probalign, provide statistically significant improvement over current approaches for identifying distantly related RNA sequences in larger genomic segments.
Enhanced Sequence-Based Function Prediction Methods and Application to Functional Similarity Networks
- Biology
- 2011
The network structure of gene functional space built by connecting proteins with functional similarity, similar to structures of protein-protein interaction networks and metabolic pathway networks is discussed.
Algorithms for the study of RNA and protein structure
- Computer Science
- 2010
A system to automatically generate two-dimensional representations of protein structure that are particularly useful in analysing complex protein folds and a method for using these diagrams as an interface to the protein substructure search methods.
Graph-based methods for large-scale protein classification and orthology inference
- Biology
- 2009
It is argued that establishing true orthologous relationships requires a phylogenetic approach which combines both trees and graphs (networks), reliable species phylogeny, genomic data for more than two species, and an insight into the processes of molecular evolution.
Ranking MEDLINE documents
- MedicineJournal of the Brazilian Computer Society
- 2013
A new methodology is developed that enables the automation of the assessment process based on a multi-criteria ranking function that contemplates six factors and seems appropriate to retrieve relevant papers out of a huge repository such as MEDLINE.
References
SHOWING 1-10 OF 29 REFERENCES
Comparative accuracy of methods for protein sequence similarity search
- BiologyBioinform.
- 1998
B Probabilistic Smith-Waterman (PSW), which is based on Hidden Markov models for a single sequence using a standard scoring matrix, and a new version of BLAST (WU-BLAST2), which uses Sum statistics for gapped alignments are compared.
Statistical evaluation of pairwise protein sequence comparison with the Bayesian bootstrap
- Computer ScienceBioinform.
- 2005
An unbiased statistical evaluation based on the Bayesian bootstrap, a resampling method operationally similar to the standard bootstrap is developed, showing that the underlying structure within benchmark databases causes Efron's standard, non-parametric bootstrap to be biased.
Improved tools for biological sequence comparison.
- Biology, Computer ScienceProceedings of the National Academy of Sciences of the United States of America
- 1988
Three computer programs for comparisons of protein and DNA sequences can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity.
Comparison of methods for searching protein sequence databases
- Computer ScienceProtein science : a publication of the Protein Society
- 1995
Search sensitivity with either the Smith‐Waterman algorithm or FASTA is significantly improved by using modern scoring matrices, such as BLOSUM45–55, and optimized gap penalties instead of the conventional PAM250 matrix.
Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms.
- BiologyGenomics
- 1991
Significance of Z-value Statistics of Smith-Waterman Scores for Protein Alignments
- BiologyComput. Chem.
- 1999
Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements.
- Computer ScienceNucleic acids research
- 2001
The use of composition-based statistics is particularly beneficial for large-scale automated applications of PSI-BLAST, and the use, for each database sequence, of a position-specific scoring system tuned to that sequence's amino acid composition.
Assessing sequence comparison methods with the average precision criterion
- Computer ScienceBioinform.
- 2003
This work finds that the low-complexity segment filtration procedure in BLAST actually harms its overall search quality and AP scores of different search methods are approximately in proportion of the logarithm of search time.
Fundamentals of massive automatic pairwise alignments of protein sequences: theoretical significance of Z-value statistics
- BiologyBioinform.
- 2004
This study provides the missing theoretical link between a Z-value cut-off used for an automatic clustering of putative orthologs and/or paralogs, and the corresponding statistical risk in such genome-scale comparisons (using non-biased or biased genomes).
Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods.
- BiologyJournal of molecular biology
- 1998
The extent to which the SAM-T98 implementation of a hidden Markov model procedure; PSI-BLAST; and the intermediate sequence search (ISS) procedure can detect evolutionary relationships between the members of the sequence database PDBD40-J is determined.