A Characterization of the DNA Data Storage Channel

  title={A Characterization of the DNA Data Storage Channel},
  author={Reinhard Heckel and Gediminas Mikutis and Robert N. Grass},
  journal={Scientific Reports},
Owing to its longevity and enormous information density, DNA, the molecule encoding biological information, has emerged as a promising archival storage medium. However, due to technological constraints, data can only be written onto many short DNA molecules that are stored in an unordered way, and can only be read by sampling from this DNA pool. Moreover, imperfections in writing (synthesis), reading (sequencing), storage, and handling of the DNA, in particular amplification via PCR, lead to a… 

On the Capacity of DNA-based Data Storage under Substitution Errors

This paper discusses a channel model that incorporates the main properties of DNA-based data storage, and provides an intuitive interpretation of the capacity formula for relevant channel parameters, compare with sub-optimal decoding methods, and concludes with a discussion on cost-efficiency.

Uncertainties in synthetic DNA-based data storage

The general procedures of the state-of-the-art DNA-based digital data storage methods are summarized, highlighting the uncertainties involved in each step as well as potential approaches to correct them.

Quantifying molecular bias in DNA data storage

Millions of unique sequences from a DNA-based digital data archival system are used to study the oligonucleotide copy unevenness problem and it is shown that the two paramount sources of bias are the synthesis and amplification (PCR) processes.

DNA data storage, sequencing data-carrying DNA

This paper shows that, starting with a model size of 107MB, the reduced accuracy from model compression can be compensated by using simple error correcting codes in the DNA sequences, paving the way for portable data-carrying DNA read head.

Molecular digital data storage using DNA

How DNA can be adopted as a storage medium for custom data, as a potential future complement to current data storage media such as computer hard disks, optical disks and tape is discussed.

DNA-Based Storage: Models and Fundamental Limits

This work introduces a new channel model, which it is called the noisy shuffling-sampling channel, which captures three key distinctive aspects of DNA storage systems: (1) the data is written onto many short DNA molecules; (2) the molecules are corrupted by noise during synthesis and sequencing and (3) theData is read by randomly sampling from the DNA pool.

On the efficient digital code representation in DNA-based data storage

This work proposes to use a series of 48 bits to encode the digital information of a host into DNA representation, which is appropriate in end-to-end digital communication systems since it introduces a digital code regardless of the computer's architecture.

DP-DNA: A Digital Pattern-Aware DNA Storage System to Improve Encoding Density

A new Digital Pattern-Aware DNA storage system, called DP-DNA, which can efficiently store digital data in the DNA storage with high encoding density and uses a digital pattern-aware code (DPAC) to analyze the patterns of a binary sequence for a DNA strand and selects an appropriate code for encoding the binary sequence to achieve a high encodingdensity.

Robust data storage in DNA by de Bruijn graph-based decoding

A de Bruijn graph-based, greedy path search algorithm (DBG-GPS), which can efficiently handle the indels, strand rearrangements, and strand breaks that emerged during synthesis, amplification, sequencing, and storage of DNA molecules by efficient reconstruction of the DNA strands.

DeSP: a systematic DNA storage error simulation pipeline

DeSP is a systematic DNA storage error Simulation Pipeline, which simulates the errors generated from all DNA storage stages and systematically guides the optimization of encoding redundancy.



Capacity-approaching DNA storage

This work reports a strategy to store and retrieve DNA information that is robust and approaches the theoretical maximum of information that can be stored per nucleotide and opens the possibility of highly reliable DNA-based storage that approaches the information capacity of DNA molecules and enables virtually unlimited data retrieval.

A DNA-Based Archival Storage System

An architecture for a DNA-based archival storage system is presented, structured as a key-value store, and leverages common biochemical techniques to provide random access, and a new encoding scheme is proposed that offers controllable redundancy, trading off reliability for density.

Fundamental limits of DNA storage systems

Under this model, the storage capacity of DNA-based storage systems under a simple model is characterized, and it is shown that a simple index-based coding scheme is optimal.

A Rewritable, Random-Access DNA-Based Storage System

The first DNA-based storage architecture that enables random access to data blocks and rewriting of information stored at arbitrary locations within the blocks is described, which suggests that DNA is a versatile media suitable for both ultrahigh density archival and rewritable storage applications.

Towards practical, high-capacity, low-maintenance information storage in synthesized DNA

Theoretical analysis indicates that the DNA-based storage scheme could be scaled far beyond current global information volumes and offers a realistic technology for large-scale, long-term and infrequently accessed digital archiving.

Robust chemical preservation of digital information on DNA in silica with error-correcting codes.

The original information could be recovered error free, even after treating the DNA in silica at 70 °C for one week, which is thermally equivalent to storing information on DNA in central Europe for 2000 years.

Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome

The design, synthesis, and assembly of the 1.08–mega–base pair Mycoplasma mycoides JCVI-syn1.0 genome starting from digitized genome sequence information and its transplantation into a M. capricolum recipient cell to create new cells that are controlled only by the synthetic chromosome are reported.

A systematic comparison of error correction enzymes by next-generation sequencing

A method to quantify errors in synthetic DNA by next-generation sequencing and is able to quantify differential specificities such as ErrASE preferentially corrects C/G → G/C transversions whereas T7 Endonuclease I preferently corrects A/T → T/A transversions.

Synthesis of high-quality libraries of long (150mer) oligonucleotides by a novel depurination controlled process

It is shown that the depurination side reaction is the limiting factor for the synthesis of libraries of long oligonucleotides on Agilent Technologies’ SurePrint® DNA microarray platform and the characterization of synthesis efficiency for such libraries is reported.

Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry

An approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost is reported, effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications.