SeekDeep: single-base resolution de novo clustering for amplicon deep sequencing

@article{Hathaway2018SeekDeepSR,
  title={SeekDeep: single-base resolution de novo clustering for amplicon deep sequencing},
  author={Nicholas J Hathaway and Christian M. Parobek and Jonathan J. Juliano and Jeffrey A. Bailey},
  journal={Nucleic Acids Research},
  year={2018},
  volume={46},
  pages={e21 - e21}
}
Abstract PCR amplicon deep sequencing continues to transform the investigation of genetic diversity in viral, bacterial, and eukaryotic populations. In eukaryotic populations such as Plasmodium falciparum infections, it is important to discriminate sequences differing by a single nucleotide polymorphism. In bacterial populations, single-base resolution can provide improved resolution towards species and strains. Here, we introduce the SeekDeep suite built around the qluster algorithm, which is… 

Figures from this paper

Genome-wide locus sequence typing (GLST) of eukaryotic pathogens
TLDR
This study generates a flexible GLST primer panel design workflow for Trypanosoma cruzi, the parasitic agent of Chagas disease, and applies the 203-target GLST panel to direct, culture-free metagenomic extracts from triatomine vectors containing a minimum of 3.69 pg/μl T. cruzi DNA.
A suite of computational tools to interrogate sequence data with local haplotype analysis within complex Plasmodium infections and other microbial mixtures
TLDR
SeekDeep, a pipeline for analyzing targeted amplicon sequencing datasets from various technologies, is able to achieve 1-base resolution even at low frequencies and read depths allowing for accurate comparison between samples and the detection of important SNPs.
Enabling high-accuracy long-read amplicon sequences using unique molecular identifiers and Nanopore sequencing
TLDR
A high-throughput amplicon sequencing approach that combines unique molecular identifiers (UMIs) with Oxford Nanopore sequencing to generate single-molecule consensus sequences of large genomic regions to pave way for widespread use of high-accuracy Amplicon sequencing in a variety of genomic applications.
Sensitive, Highly Multiplexed Sequencing of Microhaplotypes From the Plasmodium falciparum Heterozygome
TLDR
The bioinformatic and laboratory methods outlined here provide a flexible tool for efficient, low-cost, high throughput interrogation of the P. falciparum genome, and can be tailored to simultaneously address multiple questions of interest in various epidemiological settings.
Amplicon deep sequencing of low-density Plasmodium falciparum infections: an evaluation of analysis approaches
TLDR
Amplicon deep sequencing successfully determines the complexity and diversity of low-density Plasmodium infections, even in the absence of technical PCR/sequencing replicates.
Detection of low-density Plasmodium falciparum infections using amplicon deep sequencing
TLDR
Amplicon deep sequencing can be used to determine the complexity and diversity of low-density Plasmodium infections, but error filtration approaches should not be uniformly applied across samples of varying parasitaemia.
High resolution species assignment of Anopheles mosquitoes using k-mer distances on targeted sequences
TLDR
NNoVAE is presented, a method using Nearest Neighbours (NN) and Variational Autoencoders (VAE), which is applied to k-mers resulting from the ANOSPP amplicon sequences in order to hierarchically assign species identity, and is used to survey Anopheles species diversity and Plasmodium transmission patterns through space and time on a large scale.
A targeted amplicon sequencing panel to simultaneously identify mosquito species and Plasmodium presence across the entire Anopheles genus
TLDR
A multilocus amplicon sequencing approach that targets 62 highly variable loci in the Anopheles genome and two conserved lociin the Plasmodium mitochondrion, simultaneously revealing both the mosquito species and whether that mosquito carries malaria parasites is developed.
High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing.
TLDR
A high-throughput amplicon sequencing approach combining unique molecular identifiers (UMIs) with Oxford Nanopore Technologies or Pacific Biosciences circular consensus sequencing is reported, yielding high-accuracy single-molecule consensus sequences of large genomic regions.
Mumame: a software tool for quantifying gene-specific point-mutations in shotgun metagenomic data
TLDR
A software tool called Mumame is provided, which can distinguish between wildtype and mutated sequences in shotgun metagenomic data and quantify their relative abundances, and it is identified that sequencing depth is a key factor to detect rare mutations.
...
...

References

SHOWING 1-10 OF 31 REFERENCES
Species-level resolution of 16S rRNA gene amplicons sequenced through the MinION™ portable nanopore sequencer
TLDR
Although nanopore-based sequencing produces reads with lower per-base accuracy compared with other platforms, the MinION™ DNA sequencer is valuable for both high taxonomic resolution and microbial diversity analysis.
Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons.
TLDR
A new chimera detection tool called Chimera Slayer (CS), which detects chimeras with greater sensitivity than previous methods, performs well on short sequences such as those produced by the 454 Life Sciences (Roche) Genome Sequencer, and can scale to large data sets.
Minimum entropy decomposition: Unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences
TLDR
Minimum Entropy Decomposition (MED) provides a computationally efficient means to partition marker gene datasets into ‘MED nodes’, which represent homogeneous operational taxonomic units and enables sensitive discrimination of closely related organisms in marker gene amplicon datasets without relying on extensive computational heuristics and user supervision.
Using Amplicon Deep Sequencing to Detect Genetic Signatures of Plasmodium vivax Relapse.
TLDR
Deep sequencing at a highly variable region of the P. vivax merozoite surface protein 1 gene revealed impressive diversity-generating 67 unique haplotypes and detecting on average 3.6 cocirculating parasite clones within individuals, compared to 2.1 clones detected by a combination of 3 microsatellite markers.
Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data
TLDR
The entire process of inferring viral diversity from sample collection to computing measures of genetic diversity is reviewed, including sample preparation, and the effect of experimental conditions on diversity estimates due to in vitro base substitutions, insertions, deletions, and recombination is discussed.
Fast, accurate error-correction of amplicon pyrosequences using Acacia
TLDR
A tool for homopolymer error-correction that has greater scalability than existing tools and a quicker but less sensitive statistical approach to distinguish between error and genuine sequence differences is developed.
Efficient error correction for next-generation sequencing of viral amplicons
TLDR
Two new efficient error correction algorithms optimized for viral amplicons, k-mer-based error correction (KEC) and empirical frequency threshold (ET), are presented, highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses.
ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data
TLDR
ShoRAH, a computational method for quantifying genetic diversity in a mixed sample and for identifying the individual clones in the population, while accounting for sequencing errors, is developed.
Inferring Correlation Networks from Genomic Survey Data
TLDR
It is shown that community diversity is the key factor that modulates the acuteness of such compositional effects, and a new approach is developed, called SparCC, which is capable of estimating correlation values from compositional data.
MetAmp: combining amplicon data from multiple markers for OTU analysis
Motivation: We present a novel method and corresponding application, MetAmp, to combine amplicon data from multiple genomic markers into Operational Taxonomic Units (OTUs) for microbial community
...
...