STAR: ultrafast universal RNA-seq aligner

  title={STAR: ultrafast universal RNA-seq aligner},
  author={Alexander Dobin and Carrie A. Davis and Felix Schlesinger and Jorg Drenkow and Chris Zaleski and Sonali Jha and Philippe Batut and Mark Chaisson and Thomas R. Gingeras},
  volume={29 1},
MOTIVATION Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. [] Key MethodRESULTS To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential…

Figures and Tables from this paper

Systematic evaluation of spliced alignment programs for RNA-seq data

A comparison of 26 mapping protocols based on 11 programs and pipelines found major performance differences between methods on numerous benchmarks, including alignment yield, basewise accuracy, mismatch and gap placement, exon junction discovery and suitability of alignments for transcript reconstruction.

DART: a fast and accurate RNA-seq mapper with a partitioning strategy

A novel RNA-seq de novo mapping algorithm, call DART, which adopts a partitioning strategy to avoid the extension step and is shown to be a highly efficient aligner that yields the highest or comparable sensitivity and accuracy compared to most state-of-the-art aligners.

Minimap2: versatile pairwise alignment for nucleotide sequences

Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions (INDELs) and introduces new heuristics to reduce spurious alignments, surpassing most aligners specialized in one type of alignment.

Interoperable RNA-Seq analysis in the cloud.

RNASequel: accurate and repeat tolerant realignment of RNA-seq reads

RNASequel, a software package that runs as a post-processing step in conjunction with an RNA-seq aligner and systematically corrects common alignment artifacts, is developed and produces improved alignments and improves the identification of adenosine to inosine RNA editing sites on biological datasets.

Magic-BLAST, an accurate DNA and RNA-seq aligner for long and short reads

This work introduces Magic-BLAST, a new aligner based on ideas from the Magic pipeline that is versatile and robust to high levels of mismatches or extreme base composition, and reasonably fast.

Optimizing RNA-Seq Mapping with STAR.

The most important STAR options and parameters are described, as well as best practices for achieving the maximum mapping accuracy and speed.

Mapping RNA‐seq Reads with STAR

Computational protocols that produce various output files, use different RNA‐seq datatypes, and utilize different mapping strategies are described, which provide scalability for emerging sequencing technologies.

Minimap2: fast pairwise alignment for long nucleotide sequences

Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database and is 3--4 times faster than mainstream short-read mappers at comparable accuracy and 30 times faster at higher accuracy for both genomic and mRNA reads, surpassing most aligners specialized in one type of alignment.

RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes

It is demonstrated how quasi-mapping can be successfully applied to the problems of transcript-level quantification from RNA-seq reads and the clustering of contigs from de novo assembled transcriptomes into biologically-meaningful groups.



Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM)

An RNA-Seq simulator is developed that models the main impediments to RNA alignment, including alternative splicing, insertions, deletions, substitutions, sequencing errors and intron signal, and a pipeline based on BLAT is developed to explore the performance of established tools for this problem, and to compare it to the recently developed methods.

TopHat: discovering splice junctions with RNA-Seq

The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer.

Optimal spliced alignments of short sequence reads

A novel approach, called QPALMA, is presented which takes advantage of the read's quality information as well as computational splice site predictions to maximize alignment accuracy and facilitate mapping of massive amounts of sequencing data typically generated by the new technologies.

MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery

A second generation splice detection algorithm, MapSplice, whose focus is high sensitivity and specificity in the detection of splices as well as CPU and memory efficiency is introduced, which indicates that Map Splice is a highly accurate algorithm for the alignment of RNA-seq reads to splice junctions.

Fast and SNP-tolerant detection of complex variants and splicing in short reads

Computational methods for fast detection of complex variants and splicing in short reads, based on a successively constrained search process of merging and filtering position lists from a genomic index are presented.

PASSion: a pattern growth algorithm-based pipeline for splice junction detection in paired-end RNA-Seq data

PASSion is a pattern growth algorithm-based pipeline for splice site detection in paired-end RNA-Seq reads that performs better than the other three approaches when detecting junctions in highly abundant transcripts and can discover differential and shared splicing patterns among multiple samples.

Detection of splice junctions from paired-end RNA-seq data by SpliceMap

A computational method, SpliceMap, to detect splice junctions from RNA-seq data is presented, which does not depend on any existing annotation of gene structures and is capable of finding novel splicing junctions with high sensitivity and specificity.

BLAT--the BLAST-like alignment tool.

How BLAT was optimized is described, which is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences.

progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement

A new method to align two or more genomes that have undergone rearrangements due to recombination and substantial amounts of segmental gain and loss is described, demonstrating high accuracy in situations where genomes have undergone biologically feasible amounts of genome rearrangement, segmental loss and loss.

GENCODE: The reference human genome annotation for The ENCODE Project

This work has examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites, and over one-third of GENCODE protein-Coding genes aresupported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas.