A statistical method for the detection of variants from next-generation resequencing of DNA pools

  title={A statistical method for the detection of variants from next-generation resequencing of DNA pools},
  author={Vikas Kumar Bansal and Ondrej Libiger},
  pages={3213 - 3213}
Motivation: High-throughput sequencing technologies have made population-scale studies of human genetic variation possible. Accurate and comprehensive detection of DNA sequence variants is crucial for the success of these studies. Small insertions and deletions represent the second most frequent class of variation in the human genome after single nucleotide polymorphisms (SNPs). Although several alignment tools for the gapped alignment of sequence reads to a reference genome are available… 

Figures and Tables from this paper

Evaluation of variant detection software for pooled next-generation sequence data
This manuscript evaluates five different variant detection programs with regard to their ability to detect variants in synthetically pooled Illumina sequencing data, by creating simulated pooled binary alignment/map files using single-sample sequencing data from varying numbers of previously characterized samples at varying depths of coverage per sample.
An evaluation of pool-sequencing transcriptome-based exon capture for population genomics in non-model species
This approach can also be used to predict the intron-exon boundaries of targeted de novo transcripts, making it possible to abolish genotyping biases near exon ends, and an approach for simplifying bioinformatic analyses by mapping genomic reads directly to targeted transcript sequences to obtain coding variants is evaluated.
A survey of tools for variant analysis of next-generation genome sequencing data
A comprehensive survey and evaluation of NGS tools provides a valuable guideline for human geneticists working on Mendelian disorders, complex diseases and cancers.
Comprehensive Analysis to Improve the Validation Rate for Single Nucleotide Variants Detected by Next-Generation Sequencing
This detailed and systematic study provides comprehensive recommendations for improving validation rates, saving time and lowering cost in NGS analyses.
Statistical Methods for Characterizing Genomic Heterogeneity in Mixed Samples
A Bayesian statistical method for single nucleotide level analysis and a global optimization method for gene expression level analysis to characterize genomic heterogeneity in mixed samples to find a guaranteed -global optimum for a sparse mixed membership matrix factorization problem for molecular subtype classification.
Homozygous loss-of-function variants in European cosmopolitan and isolate populations
Overall HLOF genes are enriched for olfactory receptor function and are expressed in testes more often than expected, consistent with reduced purifying selection and incipient pseudogenisation.
Genomic Evidence of Rapid and Stable Adaptive Oscillations over Seasonal Time Scales in Drosophila
This work hypothesized that environmental fluctuations among seasons in a North American orchard would impose temporally variable selection on Drosophila melanogaster that would drive repeatable adaptive oscillations at balanced polymorphisms, and identified hundreds of polymorphisms whose frequency oscillates among seasons and argued that these loci are subject to strong, temporal variable selection.
Clinal and seasonal change are correlated in Drosophila melanogaster natural populations
It is shown that there is a genome-wide correlation between clinal and seasonal variation, which cannot be explained by linked selection alone and is stronger in genomic regions with higher functional content, consistent with natural selection.


Accurate detection and genotyping of SNPs utilizing population sequencing data.
Next-generation sequencing technologies have made it possible to sequence targeted regions of the human genome in hundreds of individuals. Deep sequencing represents a powerful approach for the
A map of human genome variation from population-scale sequencing
The pilot phase of the 1000 Genomes Project is presented, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms, and the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants are described.
SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples.
Methods to discover and genotype single-nucleotide polymorphism (SNP) sites from low-coverage sequencing data, making use of shared haplotype (linkage disequilibrium) information are presented.
Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data
The impact of SNP variation on the reliability of read-mapping in the context of detecting allele-specific expression (ASE) is investigated and it is found that even after masking, ∼5–10% of SNPs still have an inherent bias toward more effective mapping of one allele.
SNP detection for massively parallel whole-genome resequencing.
A consensus-calling and SNP-detection method for sequencing-by-synthesis Illumina Genome Analyzer technology that has a very low false call rate at any sequencing depth and excellent genome coverage at a high sequencing depth.
Mapping short DNA sequencing reads and calling variants using mapping quality scores.
This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.
Deep sequencing to reveal new variants in pooled DNA samples
Exons 2 to 16 of the MUTYH gene were analyzed in breast cancer patients with Illumina's (Solexa) technology and the results provide directions for designing high‐throughput analyses of candidate genes.
Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry
An approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost is reported, effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications.
Dindel: accurate indel calls from short-read data.
This work proposes a Bayesian method to call indels from short-read sequence data in individuals and populations by realigning reads to candidate haplotypes that represent alternative sequence to the reference, and achieves low false discovery rates on simulated and real data sets.