Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.

@article{Trapnell2010TranscriptAA,
  title={Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.},
  author={Cole Trapnell and Brian A. Williams and Geo Pertea and Ali Mortazavi and Gordon Kwan and Marijke J. van Baren and Steven L. Salzberg and Barbara J. Wold and Lior Pachter},
  journal={Nature biotechnology},
  year={2010},
  volume={28 5},
  pages={
          511-5
        }
}
High-throughput mRNA sequencing (RNA-Seq) promises simultaneous transcript discovery and abundance estimation. [] Key Method To test Cufflinks, we sequenced and analyzed >430 million paired 75-bp RNA-Seq reads from a mouse myoblast cell line over a differentiation time series. We detected 13,692 known transcripts and 3,724 previously unannotated ones, 62% of which are supported by independent expression data or by homologous genes in other species. Over the time series, 330 genes showed complete switches in…
Transcript assembly and abundance estimation from RNA-Seq reveals thousands of new transcripts and switching among isoforms
TLDR
The results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.
Advancing RNA-Seq analysis
TLDR
New approaches for RNA-Seq analysis that capture genome-wide transcription and splicing in unprecedented detail are introduced, and a de novo assembly approach implemented in the ABySS software reduces the annotation problem to that of aligning full-length cDNAs, which is well handled by several algorithms.
Genome-guided transcript assembly from integrative analysis of RNA sequence data
TLDR
An automated pipeline for genome annotation that integrates RNA-seq and gene-boundary data sets, which is called Generalized RNA Integration Tool, or GRIT, will facilitate the automated generation of high-quality genome annotations without the need for extensive manual annotation.
A new long-read RNA-seq analysis approach identifies and quantifies novel transcripts of very large genes
TLDR
Improved transcript identification and quantification demonstrated by the approach removes previous impediments to studies aimed at quantitative differential expression of ultra-long transcripts.
Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks
TLDR
This protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results, which takes less than 1 d of computer time for typical experiments and ∼1 h of hands-on time.
Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks.
TLDR
This protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results, which takes less than 1 d of computer time for typical experiments and ∼1 h of hands-on time.
A long-read RNA-seq approach to identify novel transcripts of very large genes.
TLDR
Improved transcript identification and quantification shown by the approach removes previous impediments to studies aimed at quantitative differential expression of ultralong transcripts.
Integrative analysis with ChIP-seq advances the limits of transcript quantification from RNA-seq.
TLDR
This study developed a novel computational method, prior-enhanced RSEM (pRSEM), which uses a complementary data type in addition to RNA-seq data to transform existing capacity to precisely estimate transcript abundances, especially at the isoform level.
Spliced synthetic genes as internal controls in RNA sequencing experiments
TLDR
A set of spike-in RNA standards, termed 'sequins' (sequencing spike-ins), that represent full-length spliced mRNA isoforms, that provide a qualitative and quantitative reference with which to navigate the complexity of the human transcriptome are developed.
Differential analysis of gene regulation at transcript resolution with RNA-seq
TLDR
Cuffdiff 2, an algorithm that estimates expression at transcript-level resolution and controls for variability evident across replicate libraries, robustly identifies differentially expressed transcripts and genes and reveals differential splicing and promoter-preference changes.
...
...

References

SHOWING 1-10 OF 35 REFERENCES
Mapping and quantifying mammalian transcriptomes by RNA-Seq
TLDR
Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.
Stem cell transcriptome profiling via massive-scale mRNA sequencing
TLDR
A massive-scale RNA sequencing protocol, short quantitative random RNA libraries or SQRL, is developed, highlighting how SQRL can be used to characterize transcriptome content and dynamics in a quantitative and reproducible manner, and suggesting that the understanding of transcriptional complexity is far from complete.
TopHat: discovering splice junctions with RNA-Seq
TLDR
The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer.
Transcript length bias in RNA-seq data confounds systems biology
TLDR
Transcript length bias for calling differentially expressed genes is a general feature of current protocols for RNA-seq technology and has implications for the ranking of differentially expression genes, and in particular may introduce bias in gene set testing for pathway analysis and other multi-gene systems biology analyses.
Alternative Isoform Regulation in Human Tissue Transcriptomes
TLDR
An in-depth analysis of 15 diverse human tissue and cell line transcriptomes on the basis of deep sequencing of complementary DNA fragments yielding a digital inventory of gene and mRNA isoform expression suggested common involvement of specific factors in tissue-level regulation of both splicing and polyadenylation.
Identifiability of isoform deconvolution from junction arrays and RNA-Seq
TLDR
Criteria is proposed that will guarantee identifiability of an isoform deconvolution model on exon and splice junction arrays and in RNA-Seq and results show that up to 97% of alternatively spliced human genes selected from the RefSeq database lead to identifiable gene models inRNA-seq.
RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays.
TLDR
It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).
Computation for ChIP-seq and RNA-seq studies
TLDR
The multilayered analyses of ChIP-seq and RNA-seq datasets are described, the software packages currently available to perform tasks at each layer are discussed and some upcoming challenges and features for future analysis tools are described.
RNA-Seq gene expression estimation with read mapping uncertainty
TLDR
Simulations with the method indicate that a read length of 20–25 bases is optimal for gene-level expression estimation from mouse and maize RNA-Seq data when sequencing throughput is fixed, and the method is capable of modeling non-uniform read distributions.
Statistical inferences for isoform expression in RNA-Seq
TLDR
The results show that isoform expression inference in RNA-Seq is possible by employing appropriate statistical methods and statistical inferences are obtained from the posterior distribution by importance sampling.
...
...