Fast and accurate approximate inference of transcript expression from RNA-seq data
@article{Hensman2014FastAA, title={Fast and accurate approximate inference of transcript expression from RNA-seq data}, author={James Hensman and Panagiotis Papastamoulis and Peter Glaus and Antti Honkela and Magnus Rattray}, journal={Bioinformatics}, year={2014}, volume={31}, pages={3881 - 3889} }
Motivation: Assigning RNA-seq reads to their transcript of origin is a fundamental task in transcript expression estimation. Where ambiguities in assignments exist due to transcripts sharing sequence, e.g. alternative isoforms or alleles, the problem can be solved through probabilistic inference. Bayesian methods have been shown to provide accurate transcript abundance estimates compared with competing methods. However, exact Bayesian inference is intractable and approximate methods such as…
35 Citations
Bayesian estimation of differential transcript usage from RNA-seq data
- Computer ScienceStatistical applications in genetics and molecular biology
- 2017
The use of cjBitSeq is extended to the DTU context, a previously introduced Bayesian model which is originally designed for identifying changes in overall expression levels is proposed and a Bayesian version of DRIMSeq, a frequentist model for inferring DTU is proposed.
Perplexity: evaluating transcript abundance estimation in the absence of ground truth
- BiologyWABI
- 2021
This study is the first to make possible model selection for transcript abundance estimation on experimental data in the absence of ground truth, and derives perplexity from the analogous metric used to evaluate language and topic models and extends the metric to carefully account for corner cases unique to RNA-seq.
Polee: RNA-Seq analysis using approximate likelihood
- Computer SciencebioRxiv
- 2020
This work proposes a new method of approximating the likelihood function of a sparse mixture model, using a technique the authors call the Pólya tree transformation, and demonstrates that substituting this approximation for the real thing achieves most of the benefits with a fraction of the computational costs, leading to more accurate detection of differential transcript expression.
A Bayesian model selection approach for identifying differentially expressed transcripts from RNA sequencing data
- Computer ScienceJournal of the Royal Statistical Society. Series C, Applied statistics
- 2018
A hierarchical Bayesian model builds on the BitSeq framework and the posterior distribution of transcript expression and differential expression is inferred by using Markov chain Monte Carlo sampling, and it is shown that the model proposed enjoys conjugacy for fixed dimension variables; thus the full conditional distributions are analytically derived.
Deriving Ranges of Optimal Estimated Transcript Expression due to Nonidentifiability
- BiologybioRxiv
- 2019
Methods to calculate a “confidence range of expression” for each transcript, representing its possible abundance across equally optimal estimates for both quantification models are proposed, informing both whether a transcript has potential estimation error due to non-identifiability and the extent of the error.
Fast and accurate quantification and differential analysis of transcriptomes
- Biology
- 2016
Improvements to both abundance estimation and differential expression analysis are presented, showing dramatic improvements to the speed of abundance estimation while maintaining accuracy, and a differential expression model is developed incorporating the uncertainty introduced by abundance estimation.
Detecting anomalies in RNA-seq quantification
- Computer SciencebioRxiv
- 2019
This work develops a computational method to detect instances where a quantification model could not thoroughly explain the input, and identifies transcripts where the read coverage has significant deviations from the expectation.
Finding ranges of optimal transcript expression quantification in cases of non-identifiability
- Computer Science
- 2019
Methods to compute the range of equally optimal estimates for the expression of each transcript, accounting for non-identifiability of the quantification model using several novel graph theoretical approaches are proposed.
Combining Multiple RNA-Seq Data Analysis Algorithms Using Machine Learning Improves Differential Isoform Expression Analysis
- Computer ScienceMethods and protocols
- 2021
A novel integrative approach that effectively combines the most widely used algorithms for differential transcript and isoform analysis using state-of-the-art machine learning techniques is developed and concludes that the strategy outperforms the application of the individual algorithms.
Improved data-driven likelihood factorizations for transcript abundance estimation
- Computer ScienceBioinform.
- 2017
This work demonstrates that model simplifications adopted by certain abundance estimation methods can lead to a diminished ability to accurately estimate the abundances of highly related transcripts, and shows that such approaches can achieve accuracy nearly indistinguishable from methods that consider the complete (i.e. per‐fragment) likelihood, while retaining the computational efficiently of the compatibility‐based factorizations.
References
SHOWING 1-10 OF 36 REFERENCES
Improved variational Bayes inference for transcript expression estimation
- Computer ScienceStatistical applications in genetics and molecular biology
- 2014
In this paper, variational Bayesian techniques are used in order to approximate the posterior distribution of transcript expression and a novel approach is introduced which integrates the latent allocation variables out of the VB approximation.
TIGAR: transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference
- BiologyBioinform.
- 2013
A statistical method to estimate transcript isoform abundances from RNA-Seq data that optimizes the number of transcript isoforms by variational Bayesian inference through an iterative procedure, and its convergence is guaranteed under a stopping criterion.
Identifying differentially expressed transcripts from RNA-seq data with biological variation
- Computer ScienceBioinform.
- 2012
A novel method for DE analysis across replicates is proposed which propagates uncertainty from the sample-level model while modelling biological variance using an expression-level-dependent prior, and the advantages of this method are demonstrated.
RNA-Seq gene expression estimation with read mapping uncertainty
- BiologyBioinform.
- 2010
Simulations with the method indicate that a read length of 20–25 bases is optimal for gene-level expression estimation from mouse and maize RNA-Seq data when sequencing throughput is fixed, and the method is capable of modeling non-uniform read distributions.
Statistical inferences for isoform expression in RNA-Seq
- BiologyBioinform.
- 2009
The results show that isoform expression inference in RNA-Seq is possible by employing appropriate statistical methods and statistical inferences are obtained from the posterior distribution by importance sampling.
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
- BiologyBMC Bioinformatics
- 2011
It is shown that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads, and estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired- end reads, depending on the number of possible splice forms for each gene.
Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.
- BiologyNature biotechnology
- 2010
The results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.
Analysis and design of RNA sequencing experiments for identifying isoform regulation
- BiologyNature Methods
- 2010
The mixture-of-isoforms (MISO) model is developed, a statistical model that estimates expression of alternatively spliced exons and isoforms and assesses confidence in these estimates, providing a probabilistic framework for RNA-seq analysis and functional insights into pre-mRNA processing.
Mapping and quantifying mammalian transcriptomes by RNA-Seq
- BiologyNature Methods
- 2008
Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.
QUANTIFYING ALTERNATIVE SPLICING FROM PAIRED-END RNA-SEQUENCING DATA.
- BiologyThe annals of applied statistics
- 2014
Novel data summaries and a Bayesian modeling framework are proposed that overcome limitations and determine biases in a non-parametric, highly flexible manner and allow to study alternative splicing patterns for individual samples and can also be the basis for downstream analyses.