Using Structural and Evolutionary Information to Detect and Correct Pyrosequencing Errors in Noncoding RNAs

  title={Using Structural and Evolutionary Information to Detect and Correct Pyrosequencing Errors in Noncoding RNAs},
  author={Vladimir Reinharz and Yann Ponty and J{\'e}r{\^o}me Waldisp{\"u}hl},
  journal={Journal of computational biology : a journal of computational molecular cell biology},
  volume={20 11},
The analysis of the sequence-structure relationship in RNA molecules is not only essential for evolutionary studies but also for concrete applications such as error-correction in next generation sequencing (NGS) technologies. The prohibitive sizes of the mutational and conformational landscapes, combined with the volume of data to process, require efficient algorithms to compute sequence-structure properties. In this article, we address the correction of NGS errors by calculating which… 

Figures from this paper

Combining structure probing data on RNA mutants with evolutionary information reveals RNA-binding interfaces
A formal framework to combine the biochemical signal collected from MaM experiments, with the evolutionary information available in multiple sequence alignments is introduced, and neutral theory principles to detect complex long-range dependencies between nucleotides of a single stranded RNA are implemented into software called aRNhAck.
Ensemble Algorithms and Analytic Combinatorics in RNA Bioinformatics and Beyond. (Algorithmes ensemblistes et combinatoire analytique en Bioinformatique des ARN)
This work describes a complementary line of research, which relies on a deep-rooted connection between combinatorics and dynamic programming, and enables an exchange of ideas between the fields of Bioinformatics and Enumerative Combinatorics that both motivates new analyses for search space of combinatorial problems, and opens to new applications in Bioinformics and allows the formulation of novel biological questions.
An Unambiguous And Complete Dynamic Programming Algorithm For Tree Alignment
This paper presents the first unambiguous and complete dynamic programming algorithm for the alignment of a pair of ordered rooted trees, and optimally aligns two trees of size n-1 and n-2 in $\Theta(n_1 n_2)$ time in the worst-case scenario.


An Unbiased Adaptive Sampling Algorithm for the Exploration of RNA Mutational Landscapes Under Evolutionary Pressure
An unbiased adaptive sampling algorithm that enables RNAmutants to sample regions of the mutational landscape poorly covered by classical algorithms is introduced and it is shown that low G+C-contents favor the apparition of internal loops and thus possibly the synthesis of tertiary structure motifs.
Efficient Algorithms for Probing the RNA Mutation Landscape
This paper generalizes the McCaskill partition function algorithm to sum over the grand canonical ensemble of all secondary structures of all mutants of the given sequence, and provides evidence that the 3′ UTR of the GB RNA virus C has been optimized to preserve evolutionarily conserved stem regions from a deleterious effect of pointwise mutations.
A global sampling approach to designing and reengineering RNA secondary structures
This article presents RNA-ensign, a novel paradigm for RNA design that uses an efficient global sampling algorithm to examine large regions of the mutational landscape under structural and thermodynamical constraints until a solution is found.
Frequency and isostericity of RNA base pairs
A quantitative measure of base pair isostericity, the IsoDiscrepancy Index (IDI), is introduced to more accurately determine which base pair substitutions can potentially occur in conserved motifs.
Geometric nomenclature and classification of RNA base pairs.
This work proposes a classification based on the observation that the planar edge-to-edge, hydrogen-bonding interactions between RNA bases involve one of three distinct edges: the Watson-Crick edge, the Hoogsteen edge, and the Sugar edge, which facilitates the recognition of recurrent three-dimensional motifs from comparison of homologous sequences.
Ribosomal RNA: a key to phylogeny
  • G. Olsen, C. Woese
  • Biology
    FASEB journal : official publication of the Federation of American Societies for Experimental Biology
  • 1993
The role of the rRNAs and some of the insights that have been gained from them are reviewed, and the importance of comparing results from multiple molecules is stressed as a method for testing the overall reliability of the organismal phylogeny.
Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences
Cd-hit-2d compares two protein datasets and reports similar matches between them; cd- Hit-est clusters a DNA/RNA sequence database and cd- hit-est-2D compares two nucleotide datasets.
Molecules as documents of evolutionary history.
Rfam: Wikipedia, clans and the “decimal” release
The pros and cons of using the online encyclopedia, Wikipedia, as a source of community-derived annotation, are discussed and the addition of groupings of related RNA families into clans is discussed.
An approximate matching algorithm for finding (sub-)optimal sequences in S-attributed grammars
A basic algorithm which, given a grammar G and a sequence omega, computes the optimal attribute for all (approximate) strings omega(') in L(G) such that d(omega, omega(')) < or = M, and whose complexity is O(n(r + 1) in time and O( n(2) in space.