A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more.

  title={A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more.},
  author={Elena Rivas and Raymond W. Lang and Sean R. Eddy},
  volume={18 2},
The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been… 

Figures and Tables from this paper

Evaluation of a sophisticated SCFG design for RNA secondary structure prediction

Investigations on both the accuracies of predicted foldings and the overall quality of generated sample sets yield the conclusion that the Boltzmann distribution of the PF sampling approach is more centered than the ensemble distribution induced by the sophisticated SCFG model, which implies a greater structural diversity within generated samples.

Improving RNA Branching Predictions: Advances and Limitations

A branch-and-bound algorithm is developed that finds the set of optimal parameters with the highest average accuracy for a given set of sequences and shows that the previous ad hoc parameters are nearly optimal for tRNA and 5S rRNA sequences on both training and testing sets.

RNA Structure Prediction

  • J. IwakiriK. Asai
  • Chemistry
    Encyclopedia of Bioinformatics and Computational Biology
  • 2019

A max-margin training of RNA secondary structure prediction integrated with the thermodynamic model

A novel algorithm for RNA secondary structure prediction that integrates the thermodynamic approach and the machine learning based weighted approach is proposed that achieves the best prediction accuracy compared with existing methods, and heavy overfitting cannot be observed.

Analysis of RNA nearest neighbor parameters reveals interdependencies and quantifies the uncertainty in RNA secondary structure prediction.

This work demonstrated that the precision of RNA secondary structure prediction is more robust than suggested by previous work based on perturbation of the nearest neighbor parameters, due to correlations between parameters.

The four ingredients of single-sequence RNA secondary structure prediction. A unifying perspective

Modeling RNA secondary structure by using intrinsic sequence-based plausible “foldability” will require the incorporation of other forms of information in order to constrain the folding space and to improve prediction accuracy, which could give an advantage to probabilistic scoring systems.

Stochastic k-Tree Grammar and Its Application in Biomolecular Structure Modeling

It is shown, for the first time, that probabilistic analysis of k-trees over strings are computable in polynomial time n Ok, which permits not only modeling of biomolecular tertiary structures but also efficient analysis and prediction of such structures.

RNA secondary structure prediction using deep learning with thermodynamic integration

A new algorithm for predicting RNA secondary structures that uses deep learning with thermodynamic integration, thereby enabling robust predictions and a new regularization for training the authors' deep neural network without overfitting it to the training data.



Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction

Four SCFG designs had prediction accuracies near the performance of current energy minimization programs, and one of these designs, introduced by Knudsen and Hein in their PFOLD algorithm, has only 21 free parameters and is significantly simpler than the others.

CONTRAfold: RNA secondary structure prediction without physics-based models

Contrafold, a novel secondary structure prediction method based on conditional log-linear models (CLLMs), a flexible class of probabilistic models which generalize upon SCFGs by using discriminative training and feature-rich scoring, achieves the highest single sequence prediction accuracies to date.

Improved RNA secondary structure prediction by maximizing expected pair accuracy.

A program, MaxExpect, that predicts RNA secondary structure by maximizing the expected base-pair accuracy, is reported, and the average PPV of optimal structure is improved from 66% to 68% at the same sensitivity level compared with free energy minimization.

RNA secondary structure prediction based on free energy and phylogenetic analysis.

A computational method for the prediction of RNA secondary structure that uses a combination of free energy and comparative sequence analysis strategies and indicates that prediction accuracy most strongly depends upon covariational information and only weakly on the energetic terms.

Analysis of the Free Energy in a Stochastic RNA Secondary Structure Model

  • M. NebelAnika Scheid
  • Computer Science
    IEEE/ACM Transactions on Computational Biology and Bioinformatics
  • 2011
The stochastic model for RNA secondary structures presented in this work has, for example, been used as the basis of a new algorithm for the (nonuniform) generation of randomRNA secondary structures.

RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble.

A novel method that forsakes this paradigm for predictions based on Boltzmann-weighted structure ensemble and introduces the notion of a centroid structure as a representative for a set of structures and describes a procedure for its identification.

Computational approaches for RNA energy parameter estimation.

A novel linear Gaussian Bayesian network that models feature relationships, which effectively makes use of sparse data by sharing statistical strength between parameters is proposed, and significant improvements in the accuracy of RNA minimum free-energy pseudoknot-free secondary structure prediction are obtained.

Dynalign: an algorithm for finding the secondary structure common to two RNA sequences.

Dynalign is a new computer algorithm that improves the accuracy of structure prediction by combining free energy minimization and comparative sequence analysis to find a low free energy structure common to two sequences without requiring any sequence identity.

Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure.

An improved dynamic programming algorithm is reported for RNA secondary structure prediction by free energy minimization and experimental constraints, derived from enzymatic and flavin mononucleotide cleavage, improve the accuracy of structure predictions.

Rich Parameterization Improves RNA Structure Prediction

It is shown that the application of more detailed models indeed improves prediction quality, while the corresponding running time of the folding algorithm remains fast, and a new RNA folding prediction model is proposed, which results in a significantly higher prediction quality than that of previous models.