• Corpus ID: 237371678

Deep DNA Storage: Scalable and Robust DNA Storage via Coding Theory and Deep Learning

@article{BarLev2021DeepDS,
  title={Deep DNA Storage: Scalable and Robust DNA Storage via Coding Theory and Deep Learning},
  author={Dan Bar-Lev and Itai Orr and Omer Sabary and Tuvi Etzion and Eitan Yaakobi},
  journal={ArXiv},
  year={2021},
  volume={abs/2109.00031}
}
The concept of DNA storage was first suggested in 1959 by Richard Feynman who shared his vision regarding nanotechnology in the talk “There is plenty of room at the bottom”. Later, towards the end of the 20-th century, the interest in storage solutions based on DNA molecules was increased as a result of the human genome project which in turn led to a significant progress in sequencing and assembly methods. DNA storage enjoys major advantages over the well-established magnetic and optical… 

Figures and Tables from this paper

Single-Read Reconstruction for DNA Data Storage Using Transformers
TLDR
This work proposes a novel approach for single-read reconstruction using an encoder-decoder Transformer architecture for DNA based data storage and achieves lower error rates when reconstructing the original data from a single read of each DNA strand compared to state-of-the-art algorithms using 2-3 copies.
On The Decoding Error Weight of One or Two Deletion Channels
TLDR
This paper studies optimal decoding for a special case of the deletion channel, referred by the k-deletion channel, which deletes exactly k symbols of the transmitted word uniformly at random, to understand how an optimal decoder operates in order to minimize the expected normalized distance.
On the Size of Balls and Anticodes of Small Diameter under the Fixed-Length Levenshtein Metric
TLDR
The minimum, maximum, and average size of a ball with radius one, in the FLL metric is considered, which is the right measure for the distance between two words of the same length.
The Input and Output Entropies of the k-Deletion/Insertion Channel
TLDR
This work studies entropy values for the k -deletion, k -insertion channel, where exactly k symbols are deleted, inserted in the transmitted word, respectively to establish a conjecture by Atashpendar et al. which claims that for the binary 1-deletions, the input entropy is maximized for the alternating words.

References

SHOWING 1-10 OF 48 REFERENCES
Rewritable Two-Dimensional DNA-Based Data Storage with Machine Learning Reconstruction
TLDR
The results demonstrate that DNA can serve both as a write-once and rewritable memory for heterogenous data and that data can be erased in a permanent, privacy-preserving manner and can be made robust to degrading channel qualities while avoiding global error-correction redundancy.
Single-Read Reconstruction for DNA Data Storage Using Transformers
TLDR
This work proposes a novel approach for single-read reconstruction using an encoder-decoder Transformer architecture for DNA based data storage and achieves lower error rates when reconstructing the original data from a single read of each DNA strand compared to state-of-the-art algorithms using 2-3 copies.
Portable and Error-Free DNA-Based Data Storage
TLDR
This work represents the only known random access DNA-based data storage system that uses error-prone nanopore sequencers, while still producing error-free readouts with the highest reported information rate/density.
Scaling up DNA data storage and random access retrieval
TLDR
A novel coding scheme is developed that dramatically reduces the physical redundancy (sequencing read coverage) required for error-free decoding to a median of 5x, while maintaining levels of logical redundancy comparable to the best prior codes.
A Characterization of the DNA Data Storage Channel
TLDR
It is found that errors within molecules are mainly due to synthesis and sequencing, while imperfections in handling and storage lead to a significant loss of sequences.
Data storage in DNA with fewer synthesis cycles using composite DNA letters
TLDR
The development of encoding and decoding methods that exploit information redundancy using composite DNA letters, a representation of a position in a sequence that consists of a mixture of all four DNA nucleotides in a predetermined ratio are reported.
Towards practical, high-capacity, low-maintenance information storage in synthesized DNA
TLDR
Theoretical analysis indicates that the DNA-based storage scheme could be scaled far beyond current global information volumes and offers a realistic technology for large-scale, long-term and infrequently accessed digital archiving.
Towards Practical and Robust DNA-based Data Archiving by Codec System Named ‘Yin-Yang’
TLDR
This paper proposes a robust DNA-based data storage method based on a new codec algorithm, namely ‘Yin-Yang’, which exhibits great potential at achieving high storing capacity per nucleotide (230 PB/gram) and high fidelity of data recovery.
DNA Fountain enables a robust and efficient storage architecture
TLDR
A storage strategy that is highly robust and approaches the information capacity per nucleotide, and a perfect retrieval from a density of 215 petabytes per gram of DNA, orders of magnitude higher than previous reports are reported.
...
...