Deep DNA Storage: Scalable and Robust DNA Storage via Coding Theory and Deep Learning
@article{BarLev2021DeepDS, title={Deep DNA Storage: Scalable and Robust DNA Storage via Coding Theory and Deep Learning}, author={Dan Bar-Lev and Itai Orr and Omer Sabary and Tuvi Etzion and Eitan Yaakobi}, journal={ArXiv}, year={2021}, volume={abs/2109.00031} }
The concept of DNA storage was first suggested in 1959 by Richard Feynman who shared his vision regarding nanotechnology in the talk “There is plenty of room at the bottom”. Later, towards the end of the 20-th century, the interest in storage solutions based on DNA molecules was increased as a result of the human genome project which in turn led to a significant progress in sequencing and assembly methods. DNA storage enjoys major advantages over the well-established magnetic and optical…
Figures and Tables from this paper
4 Citations
Single-Read Reconstruction for DNA Data Storage Using Transformers
- Computer ScienceArXiv
- 2021
This work proposes a novel approach for single-read reconstruction using an encoder-decoder Transformer architecture for DNA based data storage and achieves lower error rates when reconstructing the original data from a single read of each DNA strand compared to state-of-the-art algorithms using 2-3 copies.
On The Decoding Error Weight of One or Two Deletion Channels
- Computer ScienceArXiv
- 2022
This paper studies optimal decoding for a special case of the deletion channel, referred by the k-deletion channel, which deletes exactly k symbols of the transmitted word uniformly at random, to understand how an optimal decoder operates in order to minimize the expected normalized distance.
On the Size of Balls and Anticodes of Small Diameter under the Fixed-Length Levenshtein Metric
- Computer ScienceArXiv
- 2022
The minimum, maximum, and average size of a ball with radius one, in the FLL metric is considered, which is the right measure for the distance between two words of the same length.
The Input and Output Entropies of the k-Deletion/Insertion Channel
- Computer ScienceArXiv
- 2022
This work studies entropy values for the k -deletion, k -insertion channel, where exactly k symbols are deleted, inserted in the transmitted word, respectively to establish a conjecture by Atashpendar et al. which claims that for the binary 1-deletions, the input entropy is maximized for the alternating words.
References
SHOWING 1-10 OF 48 REFERENCES
Rewritable Two-Dimensional DNA-Based Data Storage with Machine Learning Reconstruction
- Computer SciencebioRxiv
- 2021
The results demonstrate that DNA can serve both as a write-once and rewritable memory for heterogenous data and that data can be erased in a permanent, privacy-preserving manner and can be made robust to degrading channel qualities while avoiding global error-correction redundancy.
Single-Read Reconstruction for DNA Data Storage Using Transformers
- Computer ScienceArXiv
- 2021
This work proposes a novel approach for single-read reconstruction using an encoder-decoder Transformer architecture for DNA based data storage and achieves lower error rates when reconstructing the original data from a single read of each DNA strand compared to state-of-the-art algorithms using 2-3 copies.
Portable and Error-Free DNA-Based Data Storage
- Computer ScienceScientific Reports
- 2017
This work represents the only known random access DNA-based data storage system that uses error-prone nanopore sequencers, while still producing error-free readouts with the highest reported information rate/density.
Scaling up DNA data storage and random access retrieval
- Computer SciencebioRxiv
- 2017
A novel coding scheme is developed that dramatically reduces the physical redundancy (sequencing read coverage) required for error-free decoding to a median of 5x, while maintaining levels of logical redundancy comparable to the best prior codes.
A Characterization of the DNA Data Storage Channel
- Computer ScienceScientific Reports
- 2019
It is found that errors within molecules are mainly due to synthesis and sequencing, while imperfections in handling and storage lead to a significant loss of sequences.
Data storage in DNA with fewer synthesis cycles using composite DNA letters
- Computer ScienceNature Biotechnology
- 2019
The development of encoding and decoding methods that exploit information redundancy using composite DNA letters, a representation of a position in a sequence that consists of a mixture of all four DNA nucleotides in a predetermined ratio are reported.
Towards practical, high-capacity, low-maintenance information storage in synthesized DNA
- Computer ScienceNature
- 2013
Theoretical analysis indicates that the DNA-based storage scheme could be scaled far beyond current global information volumes and offers a realistic technology for large-scale, long-term and infrequently accessed digital archiving.
Towards Practical and Robust DNA-based Data Archiving by Codec System Named ‘Yin-Yang’
- Computer Science
- 2019
This paper proposes a robust DNA-based data storage method based on a new codec algorithm, namely ‘Yin-Yang’, which exhibits great potential at achieving high storing capacity per nucleotide (230 PB/gram) and high fidelity of data recovery.
DNA Fountain enables a robust and efficient storage architecture
- Computer Science, BiologyScience
- 2017
A storage strategy that is highly robust and approaches the information capacity per nucleotide, and a perfect retrieval from a density of 215 petabytes per gram of DNA, orders of magnitude higher than previous reports are reported.