Corpus ID: 237491608

Single-Read Reconstruction for DNA Data Storage Using Transformers

  title={Single-Read Reconstruction for DNA Data Storage Using Transformers},
  author={Yotam Nahum and Eyar Ben-Tolila and Leon Anavy},
As the global need for large-scale data storage is rising exponentially, existing storage technologies are approaching their theoretical and functional limits in terms of density and energy consumption, making DNA based storage a potential solution for the future of data storage. Several studies introduced DNA based storage systems with high information density (petabytes/gram). However, DNA synthesis and sequencing technologies yield erroneous outputs. Algorithmic approaches for correcting… Expand
1 Citations

Figures and Tables from this paper

Deep DNA Storage: Scalable and Robust DNA Storage via Coding Theory and Deep Learning
This work proposes a robust, efficient and scalable solution to implement DNA-based storage systems which deploys Deep Neural Networks (DNN) which reconstructs a sequence of letters based on imperfect cluster of copies generated by the synthesis and sequencing processes. Expand


Rewritable Two-Dimensional DNA-Based Data Storage with Machine Learning Reconstruction
The results show that DNA can serve both as a write-once and rewritable memory for heterogenous data and the storage density of the molecules can be increased by using different encoding dimensions and avoiding error-correction redundancy. Expand
Overcoming High Nanopore Basecaller Error Rates for DNA Storage Via Basecaller-Decoder Integration and Convolutional Codes
This work proposes a novel approach which overcomes the high error rates in basecalled sequences by integrating a Viterbi error correction decoder with the basecaller, enabling the decoder to exploit the soft information available in the deep learning based base caller pipeline. Expand
Improved read/write cost tradeoff in DNA-based data storage using LDPC codes
This scheme breaks with the traditional separation framework and instead uses a single large block-length LDPC code for both erasure and error correction, and introduces novel techniques to handle insertion and deletion errors introduced by the synthesis process. Expand
Trellis BMA: Coded Trace Reconstruction on IDS Channels for DNA Storage
Trellis BMA is introduced, a new reconstruction algorithm whose complexity is linear in the number of traces, and its performance is compared to previous algorithms to show that it reduces the error rate on both simulated and experimental data. Expand
Molecular digital data storage using DNA
How DNA can be adopted as a storage medium for custom data, as a potential future complement to current data storage media such as computer hard disks, optical disks and tape is discussed. Expand
A Characterization of the DNA Data Storage Channel
It is found that errors within molecules are mainly due to synthesis and sequencing, while imperfections in handling and storage lead to a significant loss of sequences. Expand
Towards practical, high-capacity, low-maintenance information storage in synthesized DNA
Theoretical analysis indicates that the DNA-based storage scheme could be scaled far beyond current global information volumes and offers a realistic technology for large-scale, long-term and infrequently accessed digital archiving. Expand
Data storage in DNA with fewer synthesis cycles using composite DNA letters
The development of encoding and decoding methods that exploit information redundancy using composite DNA letters, a representation of a position in a sequence that consists of a mixture of all four DNA nucleotides in a predetermined ratio are reported. Expand
DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome
A novel pre-trained bidirectional encoder representation that forms global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts, named DNABERT, and can be readily applied to other organisms with exceptional performance. Expand
DNA Fountain enables a robust and efficient storage architecture
A storage strategy that is highly robust and approaches the information capacity per nucleotide, and a perfect retrieval from a density of 215 petabytes per gram of DNA, orders of magnitude higher than previous reports are reported. Expand