• Corpus ID: 17919140

Fast Approximate Inference of Transcript Expression Levels from RNA-seq Data

  title={Fast Approximate Inference of Transcript Expression Levels from RNA-seq Data},
  author={James Hensman and Peter Glaus and Antti Honkela and Magnus Rattray},
  journal={arXiv: Genomics},
Motivation: The mapping of RNA-seq reads to their transcripts of origin is a fundamental task in transcript expression estimation and dierential expression scoring. Where ambiguities in mapping exist due to transcripts sharing sequence, e.g. alternative isoforms or alleles, the problem becomes an instance of non-trivial probabilistic inference. Bayesian inference in such a problem is intractable and approximate methods must be used such as Markov chain Monte Carlo (MCMC) and Variational Bayes… 

Figures and Tables from this paper

Improved variational Bayes inference for transcript expression estimation

In this paper, variational Bayesian techniques are used in order to approximate the posterior distribution of transcript expression and a novel approach is introduced which integrates the latent allocation variables out of the VB approximation.

Salmon: Accurate, Versatile and Ultrafast Quantification from RNA-seq Data using Lightweight-Alignment

Salmon is introduced, a novel method and software tool for transcript quantication that exhibits state-of-the-art accuracy while being signicantly faster than most other tools.

TIGAR2: sensitive and accurate estimation of transcript isoform expression with longer RNA-Seq reads

TIGAR2 is a sensitive and accurate tool for quantifying transcript isoform abundances from RNA-Seq data and performs better than existing methods for the fixed- length reads and variable-length reads, especially for reads longer than 250 bp.



TIGAR: transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference

A statistical method to estimate transcript isoform abundances from RNA-Seq data that optimizes the number of transcript isoforms by variational Bayesian inference through an iterative procedure, and its convergence is guaranteed under a stopping criterion.

Identifying differentially expressed transcripts from RNA-seq data with biological variation

A novel method for DE analysis across replicates is proposed which propagates uncertainty from the sample-level model while modelling biological variance using an expression-level-dependent prior, and the advantages of this method are demonstrated.

RNA-Seq gene expression estimation with read mapping uncertainty

Simulations with the method indicate that a read length of 20–25 bases is optimal for gene-level expression estimation from mouse and maize RNA-Seq data when sequencing throughput is fixed, and the method is capable of modeling non-uniform read distributions.

Analysis and design of RNA sequencing experiments for identifying isoform regulation

The mixture-of-isoforms (MISO) model is developed, a statistical model that estimates expression of alternatively spliced exons and isoforms and assesses confidence in these estimates, providing a probabilistic framework for RNA-seq analysis and functional insights into pre-mRNA processing.

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.

The results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.

Differential analysis of gene regulation at transcript resolution with RNA-seq

Cuffdiff 2, an algorithm that estimates expression at transcript-level resolution and controls for variability evident across replicate libraries, robustly identifies differentially expressed transcripts and genes and reveals differential splicing and promoter-preference changes.

Mapping and quantifying mammalian transcriptomes by RNA-Seq

Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.

Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads

A new statistical method, MMSEQ, deconvolves the mapping of reads to multiple transcripts (isoforms or haplotype-specific isoforms) and can take into account non-uniform read generation and works with paired-end reads.

An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs

This work establishes a general probabilistic framework for splice graph-based reconstructions of full-length isoforms and provides an expectation-maximization (EM) algorithm for its maximum likelihood solution.

Landscape of transcription in human cells

Evidence that three-quarters of the human genome is capable of being transcribed is reported, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs that prompt a redefinition of the concept of a gene.