Fast and sensitive protein alignment using DIAMOND

@article{Buchfink2015FastAS,
  title={Fast and sensitive protein alignment using DIAMOND},
  author={Benjamin Buchfink and Chao Xie and Daniel H. Huson},
  journal={Nature Methods},
  year={2015},
  volume={12},
  pages={59-60}
}
The alignment of sequencing reads against a protein reference database is a major computational bottleneck in metagenomics and data-intensive evolutionary projects. Although recent tools offer improved performance over the gold standard BLASTX, they exhibit only a modest speedup or low sensitivity. We introduce DIAMOND, an open-source algorithm based on double indexing that is 20,000 times faster than BLASTX on short reads and has a similar degree of sensitivity. 
Sensitive protein alignments at tree-of-life scale using DIAMOND
TLDR
An improved version of DIAMOND is introduced that greatly exceeds previous search performances and harnesses supercomputing to perform tree-of-life scale protein alignments in hours, while matching the sensitivity of the gold standard BLASTP. Expand
Fast protein database as a service with kAAmer
Identification of proteins is one of the most computationally intensive steps in genomics studies. It usually relies on aligners that don’t accommodate rich information on proteins and requireExpand
AC-DIAMOND: Accelerating Protein Alignment via Better SIMD Parallelization and Space-Efficient Indexing
TLDR
This paper introduces an even faster protein alignment tool, called AC-DIAMOND, which attempts to speed up DIAMOND via better SIMD parallelization and more space-efficient indexing of the reference database; the latter allows more queries to be loaded into the memory and processed together. Expand
MMseqs2: sensitive protein sequence searching for the analysis of massive data sets
TLDR
The open-source software MMseqs2 (mmseqs.org), which improves on current search tools over the full range of speed-sensitivity trade-off, achieving sensitivities better than PSI-BLAST at more than 400 times its speed. Expand
Fast and sensitive protein sequence homology searches using hierarchical cluster BLAST
TLDR
This work presents a pipeline that improves the speed of amino acid sequence homology searches with a minimal decrease in sensitivity and specificity by searching against hierarchical clusters. Expand
GHOSTX: A Fast Sequence Homology Search Tool for Functional Annotation of Metagenomic Data.
TLDR
GHOSTX is a sequence homology search tool specifically developed for functional annotation of metagenome sequences that is more than 160 times faster than BLASTX and has sufficient search sensitivity for metagenomic analysis. Expand
Genome Recovery, Functional Profiling, and Taxonomic Classification from Metagenomes.
TLDR
All the steps that lead from raw reads to a collection of quality-controlled, functionally annotated bacterial genomes are reviewed and a working protocol is proposed using state-of-the-art, open source software tools. Expand
taxMaps: comprehensive and highly accurate taxonomic classification of short-read data in reasonable time.
TLDR
It is demonstrated that taxMaps is more sensitive and more precise than widely used taxonomic classifiers and is capable of delivering classification accuracy comparable to that of BLASTN, but at up to three orders of magnitude less computational cost. Expand
MAGpy: a reproducible pipeline for the downstream analysis of metagenome-assembled genomes (MAGs)
TLDR
MAGpy, a Snakemake pipeline that takes FASTA input and compares MAGs to several public databases, checks quality, assigns a taxonomy and draws a phylogenetic tree is presented. Expand
Taxonomic analysis of metagenomic data with kASA
Abstract The taxonomic analysis of sequencing data has become important in many areas of life sciences. However, currently available tools for that purpose either consume large amounts of RAM orExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 20 REFERENCES
RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data
TLDR
RAPSearch2 is presented, a new memory-efficient implementation of the RAPSearch algorithm that uses a collision-free hash table to index a similarity search database and the utilization of an optimized data structure further speeds up the similarity search. Expand
BLAT--the BLAST-like alignment tool.
TLDR
How BLAT was optimized is described, which is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences. Expand
A poor man’s BLASTX—high-throughput metagenomic protein database search using PAUDA
TLDR
A new approach to protein database search called PAUDA is introduced, which runs ∼10 000 times faster than BLASTX, while achieving about one-third of the assignment rate of reads to KEGG orthology groups, and producing gene and taxon abundance profiles that are highly correlated to those obtained with BLastX. Expand
PatternHunter: faster and more sensitive homology search
TLDR
A new homology search algorithm 'PatternHunter' is presented that uses a novel seed model for increased sensitivity and new hit-processing techniques for significantly increased speed. Expand
Search and clustering orders of magnitude faster than BLAST
  • R. Edgar
  • Medicine, Computer Science
  • Bioinform.
  • 2010
TLDR
UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters and offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets. Expand
Basic local alignment search tool.
A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP)Expand
mrsFAST: a cache-oblivious algorithm for short-read mapping
TLDR
In almost all recent structural variation discovery studies, short reads from a donor genome have been mapped to a reference genome as a first step, and the accuracy of such an SVD study is directly correlated to this mapping step, which also provides the main computational bottleneck of theSVD study. Expand
Blocks database and its applications.
TLDR
A database of blocks has been constructed by successive application of the fully automated PROTOMAT system to lists of protein family members obtained from Prosite documentation, and has proved useful for derivation of amino acid substitution matrices and other sets of parameters. Expand
Ab initio gene identification in metagenomic sequences
TLDR
An algorithm for gene identification in DNA sequences derived from shotgun sequencing of microbial communities and its accuracy is described and several thousands of new genes could be added to existing annotations of several human and mouse gut metagenomes. Expand
Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation
  • T. Rognes
  • Computer Science, Medicine
  • BMC Bioinformatics
  • 2011
TLDR
Efficient parallelisation using SIMD on standard hardware makes it possible to run Smith-Waterman database searches more than six times faster than before. Expand
...
1
2
...