Accurate Decoding of Pooled Sequenced Data Using Compressed Sensing

@inproceedings{Duma2013AccurateDO,
  title={Accurate Decoding of Pooled Sequenced Data Using Compressed Sensing},
  author={Denisa Duma and Mary Wootters and Anna C. Gilbert and Hung Quoc Ngo and Atri Rudra and Matthew Alpert and Timothy J. Close and Gianfranco Ciardo and Stefano Lonardi},
  booktitle={WABI},
  year={2013}
}
In order to overcome the limitations imposed by DNA barcoding when multiplexing a large number of samples in the current generation of high-throughput sequencing instruments, we have recently proposed a new protocol that leverages advances in combinatorial pooling design (group testing) [9]. We have also demonstrated how this new protocol would enable de novo selective sequencing and assembly of large, highly-repetitive genomes. Here we address the problem of decoding pooled sequenced data… Expand
Scrible: Ultra-Accurate Error-Correction of Pooled Sequenced Reads
TLDR
A novel algorithm called Scrible is introduced that exploits properties of the pooling design to accurately identify/correct sequencing errors and minimize the chance of “over-correcting”. Expand
Combinatorial pooled sequencing: experiment design and decoding
TLDR
The experiment design and decoding procedure for the combinatorial pooled sequencing applied in rare variant and rare haplotype carriers screening, complex genome assembling and single individual haplotyping is surveyed. Expand
When Less is More: “Slicing” Sequencing Data Improves Read Decoding Accuracy and De Novo Assembly Quality
TLDR
This work explores the effect of ultra-deep sequencing data in two domains: the problem of decoding reads to BAC clones (in the context of the combinatorial pooling design proposed in [1]), and theproblem of de novo assembly of BACclone clones, and shows for the first time that modern de noVO assemblers cannot take advantage of ultra, deep sequencing data. Expand
Computational framework for next-generation sequencing of heterogeneous viral populations using combinatorial pooling
TLDR
A cost-effective and reliable protocol for sequencing of viral samples, that combines NGS using barcoding and combinatorial pooling and a computational framework including algorithms for optimal virus-specific pools design and deconvolution of individual samples from sequenced pools is developed. Expand
Computational Methods for Sequencing and Analysis of Heterogeneous RNA Populations
TLDR
This work presents novel computational methods for cost and time-effective analysis of sequencing data from viral and RNA samples and describes a method for mass spectrometry data analysis and combinatorial pooling method. Expand
A Combinatorial Pooling Strategy for the Selective Sequencing of Very Large and Repetitive Genomes
  • D. Duma
  • Biology, Computer Science
  • 2013
TLDR
It is shown that combinatorial pooling is a cost-efective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. Expand
FHAST: FPGA-Based Acceleration of Bowtie in Hardware
TLDR
FHAST (FPGA hardware accelerated sequence-matching tool), a drop-in replacement for BOWTIE that uses a hardware design based on field programmable gate arrays (FP GA) that masks memory latency by executing multiple concurrent hardware threads accessing memory simultaneously. Expand
Stronger L2/L2 compressed sensing; without iterating
TLDR
The main contribution of this work is an improvement over [Gilbert, Li, Porat and Strauss, STOC 2010] with faster decoding time and significantly smaller column sparsity, answering two open questions of the aforementioned work. Expand
Construction of a map-based reference genome sequence for barley, Hordeum vulgare L.
TLDR
The experimental and computational procedures to sequence and assemble more than 80,000 bacterial artificial chromosome (BAC) clones along the minimum tiling path of a genome-wide physical map and construct 4,265 non-redundant sequence scaffolds representing clusters of overlapping BACs are reported. Expand
Sublinear-Time Sparse Recovery, and Its Power in the Design of Exact Algorithms
TLDR
Several new contributions to the field of sparse recovery are described, as well as how sparse recovery techniques can be of great significance in the design of exact algorithms, outside of the scope of the problems they first were created for. Expand
...
1
2
...

References

SHOWING 1-10 OF 19 REFERENCES
DNA Sudoku--harnessing high-throughput sequencing for multiplexed specimen analysis.
TLDR
This work reports a strategy that permits simultaneous analysis of tens of thousands of specimens through the use of combinatorial pooling strategies in which pools rather than individual specimens are assigned barcodes. Expand
Optimal pooling for genome re-sequencing with ultra-high-throughput short-read technologies
TLDR
This article focuses on re-sequencing experiments using the Solexa technology, based on bacterial artificial chromosome (BAC) clones, and offers combinatorial solutions based on approximation algorithms for the well-knownmax n-cut problem and the related max n-section problem on hypergraphs. Expand
Compressed Genotyping
TLDR
Using methods and ideas from compressed sensing and group testing, a cost-effective genotyping protocol to detect carriers for severe genetic disorders is developed and adapted to a recently developed class of high throughput DNA sequencing technologies. Expand
Assemblathon 1: a competitive assessment of de novo short read assembly methods.
TLDR
The Assemblathon 1 competition is described, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies, and it is established that it is possible to assemble the genome to a high level of coverage and accuracy. Expand
Bacterial Community Reconstruction Using Compressed Sensing
TLDR
A novel approach for reconstruction of the composition of an unknown mixture of bacteria using a single Sanger-sequencing reaction of the mixture is proposed, based on compressive sensing theory, which deals with reconstruction of a sparse signal using a small number of measurements. Expand
Combinatorial Pooling Enables Selective Sequencing of the Barley Gene Space
TLDR
It is shown that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. Expand
Efficient de novo assembly of large genomes using compressed data structures.
TLDR
A new assembler based on the overlap-based string graph model of assembly, SGA (String Graph Assembler), which provides the first practical assembler for a mammalian-sized genome on a low-end computing cluster and is simply parallelizable. Expand
Identification of rare alleles and their carriers using compressed se(que)nsing
TLDR
A novel pooling design is proposed that enables the recovery of novel or known rare alleles and their carriers in groups of individuals and can be combined with barcoding techniques to provide a feasible solution based on current resequencing costs. Expand
Barcoding bias in high-throughput multiplex sequencing of miRNA.
TLDR
It is reported that barcodes introduced through adapter ligation confer significant bias on miRNA expression profiles, which is much higher than the expected Poisson noise and masks significant expression differences between miRNA libraries. Expand
Efficiently Decodable Compressed Sensing by List-Recoverable Codes and Recursion
TLDR
Two recursive techniques to construct compressed sensing schemes that can be "decoded" in sub-linear time are presented, based on the well studied code concatenation method where the "outer" code has strong list recoverability properties. Expand
...
1
2
...