RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

@article{Li2011RSEMAT,
  title={RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome},
  author={Bo Li and Colin N. Dewey},
  journal={BMC Bioinformatics},
  year={2011},
  volume={12},
  pages={323 - 323}
}
BackgroundRNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of… 

Figures and Tables from this paper

QuickIsoSeq for Isoform Quantification in Large-Scale RNA Sequencing.
TLDR
This chapter describes the pipeline and detailed the steps required to deploy and use the QuickIsoSeq package to analyze RNA-seq datasets in practice, and discusses many QC issues, such as the abundance of rRNAs in mRNA-seq, the efficiency of globin RNA depletion in whole blood samples, and potential sample swapping.
DECONVOLUTION OF BASE PAIR LEVEL RNA-SEQ READ COUNTS FOR QUANTIFICATION OF TRANSCRIPT EXPRESSION LEVELS1
TLDR
This article proposes to use individual exonic base pairs as observation units and to model nonzero as well as zero counts at all base pairs at both the transcript and gene levels, leading to the Convolution of Poisson mixture (CPM) distribution model at the gene level.
Integrative analysis with ChIP-seq advances the limits of transcript quantification from RNA-seq.
TLDR
This study developed a novel computational method, prior-enhanced RSEM (pRSEM), which uses a complementary data type in addition to RNA-seq data to transform existing capacity to precisely estimate transcript abundances, especially at the isoform level.
RNA-Skim: a rapid method for RNA-Seq quantification at transcript level
TLDR
A novel RNA-Seq quantification method, RNA-Skim, is proposed, which partitions the transcriptome into disjoint transcript clusters based on sequence similarity, and introduces the notion of sig-mers, which are a special type of k-mers uniquely associated with each cluster.
Finding the active genes in deep RNA-seq gene expression studies
TLDR
The zFPKM normalization method accurately separates the biologically relevant genes in a cell from the ultralow-expression noisy genes that have repressed promoters, providing important guidance for the design of RNA-seq studies of gene expression.
Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences.
TLDR
It is illustrated that while the presence of differential isoform usage can lead to inflated false discovery rates in differential expression analyses on simple count matrices and transcript-level abundance estimates improve the performance in simulated data, the difference is relatively minor in several real data sets.
TIGAR2: sensitive and accurate estimation of transcript isoform expression with longer RNA-Seq reads
TLDR
TIGAR2 is a sensitive and accurate tool for quantifying transcript isoform abundances from RNA-Seq data and performs better than existing methods for the fixed- length reads and variable-length reads, especially for reads longer than 250 bp.
A Robust Method for Transcript Quantification with RNA-seq Data
TLDR
A general linear model for transcript quantification that leverages reads spanning multiple splice junctions to ameliorate identifiability and can infer an accurate set of dominantly expressed transcripts while existing methods tend to assign positive expression to every candidate isoform.
Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences
TLDR
It is shown that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability, and an R package is provided to help users integrate transcript- level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Mapping RNA‐seq Reads with STAR
TLDR
Computational protocols that produce various output files, use different RNA‐seq datatypes, and utilize different mapping strategies are described, which provide scalability for emerging sequencing technologies.
...
...

References

SHOWING 1-10 OF 47 REFERENCES
Accurate Estimation of Expression Levels of Homologous Genes in RNA-seq Experiments
TLDR
This work presents a rigorous alternative for handling the reads generated in anRNA-seq experiment within a probabilistic model for RNA-seq data; it develops maximum likelihood-based methods for estimating the model parameters and takes into account the fact that the DNA of the sequenced individual is not a perfect copy of the reference sequence.
TopHat: discovering splice junctions with RNA-Seq
TLDR
The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer.
Mapping and quantifying mammalian transcriptomes by RNA-Seq
TLDR
Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.
RNA-Seq gene expression estimation with read mapping uncertainty
TLDR
Simulations with the method indicate that a read length of 20–25 bases is optimal for gene-level expression estimation from mouse and maize RNA-Seq data when sequencing throughput is fixed, and the method is capable of modeling non-uniform read distributions.
Estimation of alternative splicing isoform frequencies from RNA-Seq data
TLDR
A novel expectation-maximization algorithm for inference of isoform- and gene-specific expression levels from RNA-Seq data, referred to as IsoEM, is based on disambiguating information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information when available.
Full-length transcriptome assembly from RNA-Seq data without a reference genome.
TLDR
The Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available, providing a unified solution for transcriptome reconstruction in any sample.
Prediction of alternative isoforms from exon expression levels in RNA-Seq experiments
TLDR
Methods that allow the prediction and quantification of alternative isoforms derived solely from exon expression levels in RNA-Seq data are developed, based on an explicit statistical model, and comprehensively addresses multiple aspects ofAlternative isoform analysis.
Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.
TLDR
The results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.
Identification of novel transcripts in annotated genomes using RNA-Seq
TLDR
An algorithm for reference annotation-based transcript assembly is presented and it is shown how it can be used to rapidly investigate novel transcripts revealed by RNA-Seq in comparison with a reference annotation.
Isoform abundance inference provides a more accurate estimation of gene expression levels in RNA-seq.
TLDR
This paper made the first attempt to compare the two strategies to estimate gene expression levels from RNA-seq data through a series of simulation studies, and showed that the isoform-based method gives not only more accurate estimation but also has less uncertainty than the UI-based strategy.
...
...