Fast and sensitive protein alignment using DIAMOND

  title={Fast and sensitive protein alignment using DIAMOND},
  author={Benjamin Buchfink and Chao Xie and Daniel H. Huson},
  journal={Nature Methods},
The alignment of sequencing reads against a protein reference database is a major computational bottleneck in metagenomics and data-intensive evolutionary projects. Although recent tools offer improved performance over the gold standard BLASTX, they exhibit only a modest speedup or low sensitivity. We introduce DIAMOND, an open-source algorithm based on double indexing that is 20,000 times faster than BLASTX on short reads and has a similar degree of sensitivity. 
Sensitive protein alignments at tree-of-life scale using DIAMOND
An improved version of DIAMOND is introduced that greatly exceeds previous search performances and harnesses supercomputing to perform tree-of-life scale protein alignments in hours, while matching the sensitivity of the gold standard BLASTP.
Fast protein database as a service with kAAmer
Identification of proteins is one of the most computationally intensive steps in genomics studies. It usually relies on aligners that don’t accommodate rich information on proteins and require
AC-DIAMOND: Accelerating Protein Alignment via Better SIMD Parallelization and Space-Efficient Indexing
This paper introduces an even faster protein alignment tool, called AC-DIAMOND, which attempts to speed up DIAMOND via better SIMD parallelization and more space-efficient indexing of the reference database; the latter allows more queries to be loaded into the memory and processed together.
MMseqs2: sensitive protein sequence searching for the analysis of massive data sets
The open-source software MMseqs2 (, which improves on current search tools over the full range of speed-sensitivity trade-off, achieving sensitivities better than PSI-BLAST at more than 400 times its speed.
Fast and sensitive protein sequence homology searches using hierarchical cluster BLAST
This work presents a pipeline that improves the speed of amino acid sequence homology searches with a minimal decrease in sensitivity and specificity by searching against hierarchical clusters.
GHOSTX: A Fast Sequence Homology Search Tool for Functional Annotation of Metagenomic Data.
GHOSTX is a sequence homology search tool specifically developed for functional annotation of metagenome sequences that is more than 160 times faster than BLASTX and has sufficient search sensitivity for metagenomic analysis.
Genome Recovery, Functional Profiling, and Taxonomic Classification from Metagenomes.
All the steps that lead from raw reads to a collection of quality-controlled, functionally annotated bacterial genomes are reviewed and a working protocol is proposed using state-of-the-art, open source software tools.
taxMaps: comprehensive and highly accurate taxonomic classification of short-read data in reasonable time.
It is demonstrated that taxMaps is more sensitive and more precise than widely used taxonomic classifiers and is capable of delivering classification accuracy comparable to that of BLASTN, but at up to three orders of magnitude less computational cost.
MAGpy: a reproducible pipeline for the downstream analysis of metagenome-assembled genomes (MAGs)
MAGpy, a Snakemake pipeline that takes FASTA input and compares MAGs to several public databases, checks quality, assigns a taxonomy and draws a phylogenetic tree is presented.
Taxonomic analysis of metagenomic data with kASA
Abstract The taxonomic analysis of sequencing data has become important in many areas of life sciences. However, currently available tools for that purpose either consume large amounts of RAM or


RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data
RAPSearch2 is presented, a new memory-efficient implementation of the RAPSearch algorithm that uses a collision-free hash table to index a similarity search database and the utilization of an optimized data structure further speeds up the similarity search.
BLAT--the BLAST-like alignment tool.
How BLAT was optimized is described, which is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences.
A poor man’s BLASTX—high-throughput metagenomic protein database search using PAUDA
A new approach to protein database search called PAUDA is introduced, which runs ∼10 000 times faster than BLASTX, while achieving about one-third of the assignment rate of reads to KEGG orthology groups, and producing gene and taxon abundance profiles that are highly correlated to those obtained with BLastX.
PatternHunter: faster and more sensitive homology search
A new homology search algorithm 'PatternHunter' is presented that uses a novel seed model for increased sensitivity and new hit-processing techniques for significantly increased speed.
Search and clustering orders of magnitude faster than BLAST
  • R. Edgar
  • Medicine, Computer Science
  • 2010
UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters and offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets.
Basic local alignment search tool.
mrsFAST: a cache-oblivious algorithm for short-read mapping
In almost all recent structural variation discovery studies, short reads from a donor genome have been mapped to a reference genome as a first step, and the accuracy of such an SVD study is directly correlated to this mapping step, which also provides the main computational bottleneck of theSVD study.
Blocks database and its applications.
A database of blocks has been constructed by successive application of the fully automated PROTOMAT system to lists of protein family members obtained from Prosite documentation, and has proved useful for derivation of amino acid substitution matrices and other sets of parameters.
Ab initio gene identification in metagenomic sequences
An algorithm for gene identification in DNA sequences derived from shotgun sequencing of microbial communities and its accuracy is described and several thousands of new genes could be added to existing annotations of several human and mouse gut metagenomes.
Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation
  • T. Rognes
  • Computer Science, Medicine
    BMC Bioinformatics
  • 2011
Efficient parallelisation using SIMD on standard hardware makes it possible to run Smith-Waterman database searches more than six times faster than before.