• Publications
  • Influence
Fast and sensitive protein alignment using DIAMOND
TLDR
DIAMOND is introduced, an open-source algorithm based on double indexing that is 20,000 times faster than BLASTX on short reads and has a similar degree of sensitivity.
CNV-seq, a new method to detect copy number variation using high-throughput sequencing
TLDR
The results show that the number of reads, not the length of the reads is the key factor determining the resolution of detection, which favors the next-generation sequencing methods that rapidly produce large amount of short reads.
A poor man’s BLASTX—high-throughput metagenomic protein database search using PAUDA
TLDR
A new approach to protein database search called PAUDA is introduced, which runs ∼10 000 times faster than BLASTX, while achieving about one-third of the assignment rate of reads to KEGG orthology groups, and producing gene and taxon abundance profiles that are highly correlated to those obtained with BLastX.
Deep sequencing of 10,000 human genomes
TLDR
This work reports on the sequencing of 10,545 human genomes at 30×–40× coverage with an emphasis on quality metrics and novel variant and sequence discovery and concludes that high-coverage genome sequencing provides accurate detail on human variation for discovery and clinical applications.
RiboTagger: fast and unbiased 16S/18S profiling using whole community shotgun metagenomic or metatranscriptome surveys
TLDR
A new program called RiboTagger is presented that identifies and extracts taxonomically informative ribotags located in a specified variable region of the SSU gene in a high-throughput fashion and is substantially faster than comparable programs.
Deep Sequencing of 10,000 Human Genomes
TLDR
This work represents the largest effort to date in sequencing human genomes at deep coverage with these new standards, and concludes that high coverage genome sequencing provides accurate detail on human variation for discovery and for clinical applications.
The blood DNA virome in 8,000 humans
The characterization of the blood virome is important for the safety of blood-derived transfusion products, and for the identification of emerging pathogens. We explored non-human sequence data from
Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads
TLDR
The performance of this approach compares favorably with that of full-featured assemblers and that of a recently published HMM-based gene-centric assembler, both in terms of the number of reference genes detected and of the percentage of reference sequence covered.
MetaScope - Fast and accurate identification of microbes in metagenomic sequencing data
TLDR
MetaScope is a fast and accurate tool for analyzing (host-associated) metagenome datasets and is the winner of the 2013 DTRA software challenge entitled "Identify Organisms from a Stream of DNA Sequences".
Identification of individuals by trait prediction using whole-genome sequencing data
TLDR
A maximum entropy algorithm is developed that integrates multiple predictions to determine which genomic samples and phenotype measurements originate from the same person and may have far-reaching ethical and legal implications.
...
1
2
...