• Publications
  • Influence
bpRNA: large-scale automated annotation and analysis of RNA secondary structure
While RNA secondary structure prediction from sequence data has made remarkable progress, there is a need for improved strategies for annotating the features of RNA secondary structures. Here we
LinearPartition: linear-time approximation of RNA folding partition function and base-pairing probabilities
This paper designs a similar linear-time heuristic algorithm, LinearPartition, to approximate the partition function and base-pairing probabilities, which is shown to be orders of magnitude faster than Vienna RNAfold and CONTRAfold (e.g. 2.5 days versus 1.3 min).
CoV-Seq, a New Tool for SARS-CoV-2 Genome Analysis and Visualization: Development and Usability Study
Background COVID-19 became a global pandemic not long after its identification in late 2019. The genomes of SARS-CoV-2 are being rapidly sequenced and shared on public repositories. To keep up with
LinearDesign: Efficient Algorithms for Optimized mRNA Sequence Design
This work provides efficient computational tools to speed up and improve mRNA vaccine development and develops two algorithms for incorporating the codon optimality into the design, one based on k-best parsing to find alternative sequences and one directly incorporating codon optimization into the dynamic programming.
ThreshKnot: Thresholded ProbKnot for Improved RNA Secondary Structure Prediction
It is suggested that ThreshKnot should replace MEA as the default partition function-based structure prediction algorithm in RNA structure prediction because of its higher structure prediction accuracy, its capability to predict pseudoknots, and its faster runtime and easier implementation.
CoV-Seq: SARS-CoV-2 Genome Analysis and Visualization
Summary COVID-19 has become a global pandemic not long after its inception in late 2019. SARS-CoV-2 genomes are being sequenced and shared on public repositories at a fast pace. To keep up with these
Learning to Fold RNAs in Linear Time
A linear-time machine learning-based folding system, using recently proposed approximate folding tool LinearFold as inference engine, and structured SVM (sSVM) as training algorithm, and introduces a max violation update strategy to remedy non-convergence of naive sSVM with inexact search inference.
LinearTurboFold: Linear-Time Global Prediction of Conserved Structures for RNA Homologs with Applications to SARS-CoV-2
LinearTurboFold is a linear-time algorithm that is orders of magnitude faster, making it the first method to simultaneously fold and align whole genomes of SARS-CoV-2 variants, the longest known RNA virus (∼30 kilobases).
LinearFold: Linear-Time Prediction of RNA Secondary Structures
The first linear-time algorithm for secondary structure prediction in genome-wide applications is designed, which can be used with both thermodynamic and machine-learned scoring functions and results in even higher overall accuracy on a diverse database of sequences with known structures.
Improved and Linear-Time Stochastic Sampling of RNA Secondary Structure with Applications to SARS-CoV-2
LinearSampling is the first RNA structure sampling algorithm to scale up to the full-genome of SARS-CoV-2 without local window constraints, taking only 69.2 seconds on its reference sequence, and correlates well with the experimentally-guided structures.