Identifying differentially expressed transcripts from RNA-seq data with biological variation

  title={Identifying differentially expressed transcripts from RNA-seq data with biological variation},
  author={Peter Glaus and Antti Honkela and Magnus Rattray},
  pages={1721 - 1728}
Motivation: High-throughput sequencing enables expression analysis at the level of individual transcripts. The analysis of transcriptome expression levels and differential expression (DE) estimation requires a probabilistic approach to properly account for ambiguity caused by shared exons and finite read sampling as well as the intrinsic biological variance of transcript expression. Results: We present Bayesian inference of transcripts from sequencing data (BitSeq), a Bayesian approach for… 

Figures and Tables from this paper

Bayesian inference of differentially expressed transcripts and their abundance from multi-condition RNA-seq data

A Bayesian approach to directly identify differentially expressed transcripts from RNA-seq data is developed, which features a novel joint model of the sample variability and the differential state of individual transcripts, providing insights into key signaling pathways associated with breast cancer recurrence.

Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences.

It is illustrated that while the presence of differential isoform usage can lead to inflated false discovery rates in differential expression analyses on simple count matrices and transcript-level abundance estimates improve the performance in simulated data, the difference is relatively minor in several real data sets.

Fast Approximate Inference of Transcript Expression Levels from RNA-seq Data

An approximate inference scheme based on Variational Bayes applied to an existing model of transcript expression inference from RNA-seq data is proposed, demonstrating that the increase in speed requires only a small trade-o in accuracy of expression level estimation.

Bayesian estimation of differential transcript usage from RNA-seq data

The use of cjBitSeq is extended to the DTU context, a previously introduced Bayesian model which is originally designed for identifying changes in overall expression levels is proposed and a Bayesian version of DRIMSeq, a frequentist model for inferring DTU is proposed.

NPEBseq: nonparametric empirical bayesian-based procedure for differential expression analysis of RNA-seq data

NPEBseq can successfully detect differential expression between different conditions not only at gene level but also at exon level from RNA-seq datasets and performs significantly better than current methods and can be applied to genome-wide RNA- sequencing datasets.

Estimation of isoform expression in RNA-seq data using a hierarchical Bayesian model

A novel hierarchical Bayesian method to estimate isoform expression using a Multinomial distribution is presented and it helps to achieve a better performance over other state-of-the-art algorithms forisoform expression estimation.

Fast and accurate approximate inference of transcript expression from RNA-seq data

This work proposes a novel approximate inference scheme based on VB and applies it to an existing model of transcript expression inference from RNA-seq data, demonstrating a significant increase in speed with only very small loss in accuracy of expression level estimation.

A Bayesian model selection approach for identifying differentially expressed transcripts from RNA sequencing data

A hierarchical Bayesian model builds on the BitSeq framework and the posterior distribution of transcript expression and differential expression is inferred by using Markov chain Monte Carlo sampling, and it is shown that the model proposed enjoys conjugacy for fixed dimension variables; thus the full conditional distributions are analytically derived.

Nonparametric expression analysis using inferential replicate counts

The proposed nonparametric model, Swish, is applied to a single-cell RNA-seq dataset, assessing differential expression between sub-populations of cells, and compared with popular differential expression analysis methods, and its performance to the Wilcoxon test is compared.

DEIsoM: a hierarchical Bayesian model for identifying differentially expressed isoforms using biological replicates

A novel hierarchical Bayesian model called Differentially Expressed Isoform detection from Multiple biological replicates (DEIsoM) for identifying differentially expressed (DE) isoforms from multiple biological replicate representing two conditions, e.g. multiple samples from healthy and diseased subjects is presented.



RNA-Seq gene expression estimation with read mapping uncertainty

Simulations with the method indicate that a read length of 20–25 bases is optimal for gene-level expression estimation from mouse and maize RNA-Seq data when sequencing throughput is fixed, and the method is capable of modeling non-uniform read distributions.

Estimation of alternative splicing isoform frequencies from RNA-Seq data

A novel expectation-maximization algorithm for inference of isoform- and gene-specific expression levels from RNA-Seq data, referred to as IsoEM, is based on disambiguating information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information when available.

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

It is shown that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads, and estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired- end reads, depending on the number of possible splice forms for each gene.

Isoform abundance inference provides a more accurate estimation of gene expression levels in RNA-seq.

This paper made the first attempt to compare the two strategies to estimate gene expression levels from RNA-seq data through a series of simulation studies, and showed that the isoform-based method gives not only more accurate estimation but also has less uncertainty than the UI-based strategy.

Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling

A new approach for mapping and analysing sequencing reads is introduced that yields substantially improved performance in gene expression profiling, increasing the number of transcripts that can reliably be quantified to over 40%, andrapolations to higher sequencing depths highlight the need for efficient complementary steps.

Analysis and design of RNA sequencing experiments for identifying isoform regulation

The mixture-of-isoforms (MISO) model is developed, a statistical model that estimates expression of alternatively spliced exons and isoforms and assesses confidence in these estimates, providing a probabilistic framework for RNA-seq analysis and functional insights into pre-mRNA processing.

Mapping and quantifying mammalian transcriptomes by RNA-Seq

Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.

RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays.

It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.

The results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.

Transcriptome and targetome analysis in MIR155 expressing cells using RNA-seq.

High-throughput multiplexed Illumina-based next-generation sequencing (NGS) provides a digital readout of absolute transcript levels and imparts a higher level of accuracy and dynamic range than microarray platforms.