Full-length transcriptome assembly from RNA-Seq data without a reference genome.

  title={Full-length transcriptome assembly from RNA-Seq data without a reference genome.},
  author={Manfred G. Grabherr and Brian J. Haas and Moran Yassour and Joshua Z. Levin and Dawn A Thompson and Ido Amit and Xian Adiconis and Lin Fan and Raktima Raychowdhury and Qiandong Zeng and Zehua Chen and Evan Mauceli and Nir Hacohen and Andreas Gnirke and Nicholas Rhind and Federica Di Palma and Bruce W. Birren and Chad Nusbaum and Kerstin Lindblad-Toh and Nir Friedman and Aviv Regev},
  journal={Nature biotechnology},
  volume={29 7},
Massively parallel sequencing of cDNA has enabled deep and efficient probing of transcriptomes. [] Key Result By efficiently constructing and analyzing sets of de Bruijn graphs, Trinity fully reconstructs a large fraction of transcripts, including alternatively spliced isoforms and transcripts from recently duplicated genes.

Extension of Partial Gene Transcripts by Iterative Mapping of RNA-Seq Raw Reads

An effective method to improve the contiguity of partial transcripts in silico that, in the absence of a reference genome, may be a quick and cost-effective alternative to their extension by laboratory experimentation is presented.

Transcriptome Analysis for Non-Model Organism: Current Status and Best-Practices

An overview of the state-of-the-art methods including quality check and pre-processing of raw reads, the pros and cons of de novo transcriptome assemblers, generating non-redundant transcript data and further mining of transcriptomic data for particular biological questions are provided.

Semantic Assembly and Annotation of Draft RNAseq Transcripts without a Reference Genome

A computational workflow for the reconstruction and functional annotation of expressed gene transcripts that does not require a reference genome sequence and can be tolerant to low coverage, high error rates and other issues that often lead to poor results of de novo assembly in studies of non-model organisms is proposed.

De novo transcript sequence reconstruction from RNA-Seq: reference generation and analysis with Trinity

This protocol describes the use of the Trinity platform for de novo transcriptome assembly from RNA-Seq data in non-model organisms and presents Trinity’s supported companion utilities for downstream applications, including RSEM for transcript abundance estimation and R/Bioconductor packages for identifying differentially expressed transcripts across samples.

SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads

The conclusion is that SOAPdenovo-Trans provides higher contiguity, lower redundancy and faster execution, compared with two other popular transcriptome assemblers.

De novo transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers

A large-scale comparative study in which 10 de novo assembly tools are applied to 9 RNA-Seq data sets spanning different kingdoms of life, finding that Trinity, SPAdes, and Trans-ABySS, followed by Bridger and SOAPdenovo-Trans, generally outperformed the other tools compared.

A Comparison of Next Generation Sequencing Technologies for Transcriptome Assembly and Utility for RNA-Seq in a Non-Model Bird

In the absence of a reference genome, it is found that Illumina reads alone produced a high quality transcriptome appropriate for RNA-Seq gene expression analyses.

De novo assembly of transcriptome from next-generation sequencing data

In the current review, the grand strategy in applying De Bruijn Graph (DBG) approach is illustrated and many parameters proven critical in transcriptome assembly using DBG, including k-mer length, coverage depth of reads, genome complexity, performance of different programs are addressed in greater details.

FRAMA: from RNA-seq data to annotated mRNA assemblies

A comparison with three different sources of naked mole-rat transcripts reveals that FRAMA’s gene models are better supported by RNA-seq data than any other transcript set, demonstrating the competitiveness of FRAMA to state of the art genome-based transcript reconstruction approaches.

Algorithms for transcriptome quantification and reconstruction from RNA-Seq data

A genome-guided and annotation-guided transcriptome reconstruction methods as well as methods for transcript and gene expression level estimation, and empirical results show that the proposed methods improve transcriptome quantification and reconstruction accuracy compared to previous methods.



Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution

High-throughput sequencing of complementary DNAs (RNA-Seq) and strand-specific array data provide rich condition-specific information on novel, mostly non-coding transcripts, untranslated regions and gene structures, thus improving the existing genome annotation.

Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing

This work presents a general approach for ab initio discovery of the complete transcriptome of the budding yeast, based only on the unannotated genome sequence and millions of short reads from a single massively parallel sequencing run.

De novo transcriptome assembly with ABySS

This work assembled approximately 194 million reads using ABySS into 66 921 contigs 100 bp or longer, representing over 30 million base pairs of unique transcriptome sequence, or roughly 1% of the genome.

Advancing RNA-Seq analysis

New approaches for RNA-Seq analysis that capture genome-wide transcription and splicing in unprecedented detail are introduced, and a de novo assembly approach implemented in the ABySS software reduces the annotation problem to that of aligning full-length cDNAs, which is well handled by several algorithms.

Comprehensive comparative analysis of strand-specific RNA sequencing methods

A comprehensive computational pipeline is developed to compare library quality metrics from any RNA-seq method and identified the dUTP second-strand marking and the Illumina RNA ligation methods as the leading protocols, with the former benefitting from the current availability of paired-end sequencing.

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.

The results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.

Ab initio reconstruction of cell type–specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs

Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence, is presented and the power of ab initio reconstruction is demonstrated to render a comprehensive picture of mammalian transcriptomes.

Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs

Substantial variation in protein-coding genes is identified, including thousands of novel 5′-start sites, 3′-ends, and internal coding exons, and the gene structures of over a thousand lincRNA and antisense loci are determined.

TopHat: discovering splice junctions with RNA-Seq

The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer.

ALLPATHS: de novo assembly of whole-genome shotgun microreads.

A general method for genome assembly that can be applied to all types of DNA sequence data, not only short read data, but also conventional sequence reads is described.