Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids

  title={Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids},
  author={Richard Durbin and Sean R. Eddy and Anders Krogh and Graeme J. Mitchison},
Probablistic models are becoming increasingly important in analyzing the huge amount of data being produced by large-scale DNA-sequencing efforts such as the Human Genome Project. For example, hidden Markov models are used for analyzing biological sequences, linguistic-grammar-based probabilistic models for identifying RNA secondary structure, and probabilistic evolutionary models for inferring phylogenies of sequences from different organisms. This book gives a unified, up-to-date and self… 
Modelling Biological Sequences by Grammatical Inference
This tutorial proposes a quick introduction to the world of the biological macromolecules and surveys the approaches related to grammatical inference which have been developed in Bioinformatics to model these sequences, from well established weighting schemes for Profile HMM and Stochastic Context-Free grammars to approaches learning the structure or topology of the Grammars.
A probabilistic model for the evolution of RNA structure
This work considers a highly simplified evolutionary model for RNA, called "The TKF91 Structure Tree" (following Thorne, Kishino and Felsenstein's 1991 model of sequence evolution with indels), which is implemented for pairwise alignment as proof of principle for such an approach.
Advances in Hidden Markov Models for Sequence Annotation
HMMs have been used in so many contexts over the course of the last fiftee n years that they almost require no introduction, and a variety of new challenges have come to fore in the algorithmic analys is of HMMs.
Hidden Markov models for remote protein homology detection
Improvements to protein homology detection methods are described, including improvements to profile HMMs that are used in database searches to identify homologous protein sequences that belong to the same protein family.
Parametric inference for biological sequence analysis.
  • L. Pachter, B. Sturmfels
  • Computer Science
    Proceedings of the National Academy of Sciences of the United States of America
  • 2004
Thepolytope propagation algorithm for computing the Newton polytope of an observation from a graphical model is introduced, a geometric version of the sum-product algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models.
Inferring function from homology.
A pipeline of freely available Web-based tools to analyze protein-coding DNA sequences of unknown function is proposed and accumulated information obtained during each step of the pipeline is used to build a testable hypothesis of function.
Evolutionary Triplet Models of Structured RNA
A “transducer composition” algorithm for extending pairwise probabilistic models of RNA structural evolution to models of multiple sequences related by a phylogenetic tree, and dynamic programming algorithms, which are robust to null cycles and empty bifurcations, for parsing this grammar.
Web-based bioinformatic resources for protein and nucleic acids sequence alignment
This review will describe the current on-line resources available, including protein and nucleic acids sequence alignment, and certain method for analyzing genetic/protein data has been found to be extremely computationally intensive, providing motivation for the use of powerful computers.
Bioinformatic tools for DNA/protein sequence analysis, functional assignment of genes and protein classification
  • B. Rehm
  • Biology
    Applied Microbiology and Biotechnology
  • 2001
This review intends to provide a guide to choosing the most efficient way to analyze a new sequence or to collect information on a gene or protein of interest by applying current publicly available databases and Web services.
Probabilistic Methods for Computational Annotation of Genomic Sequences
A new method that uses protein profiles that can be generated from a set of related proteins to improve the accuracy of present gene prediction methods is introduced, and the specific models used in the presented methods are described.