• Publications
  • Influence
Random access in large-scale DNA data storage
TLDR
A large library of primers are designed and validated that enable individual recovery of all files stored within the DNA, and an algorithm is developed that greatly reduces the sequencing read coverage required for error-free decoding by maximizing information from all sequence reads. Expand
Parsing Algebraic Word Problems into Equations
This paper formalizes the problem of solving multi-sentence algebraic word problems as that of generating and scoring equation trees. We use integer linear programming to generate equation trees andExpand
Scaling up DNA data storage and random access retrieval
TLDR
A novel coding scheme is developed that dramatically reduces the physical redundancy (sequencing read coverage) required for error-free decoding to a median of 5x, while maintaining levels of logical redundancy comparable to the best prior codes. Expand
Clustering Billions of Reads for DNA Data Storage
TLDR
This work presents a novel distributed algorithm for approximately computing the underlying clusters of DNA sequences that achieves higher accuracy and a 1000x speedup on three real datasets. Expand
Quantifying Molecular Bias in DNA Data Storage
TLDR
This paper uses millions of unique sequences from a DNA-based digital data archival system to study the oligonucleotide copy unevenness problem and shows that two important sources of bias are the synthesis process and the Polymerase Chain Reaction (PCR) process. Expand
A empirical comparison of preservation methods for synthetic DNA data storage
TLDR
The findings show that errors and erasures are stochastic and show no practical distribution difference between preservation methods, and the physical density of these methods is compared to provide a stability versus density trade-offs discussion. Expand
Quantifying molecular bias in DNA data storage
TLDR
Millions of unique sequences from a DNA-based digital data archival system are used to study the oligonucleotide copy unevenness problem and it is shown that the two paramount sources of bias are the synthesis and amplification (PCR) processes. Expand
Erratum: Random access in large-scale DNA data storage
TLDR
In the version of this article initially published, the references in the reference list were in the wrong order; the references have been renumbered as follows. Expand
An Empirical Comparison of Preservation Methods for Synthetic DNA Data Storage
TLDR
Nine different methods used to preserve data files encoded in synthetic DNA are evaluated by accelerated aging of nearly 29 000 DNA sequences and show that errors and erasures are stochastic and show no practical distribution difference between preservation methods. Expand
A Comprehensive Study of Synthetic DNA Preservation for DNA Data Storage
TLDR
The findings show that errors and erasures are stochastic and show no practical distribution difference between preservation methods, and the physical density of these methods is compared to provide a stability versus density trade-offs discussion. Expand