Trellis BMA: Coded Trace Reconstruction on IDS Channels for DNA Storage

  title={Trellis BMA: Coded Trace Reconstruction on IDS Channels for DNA Storage},
  author={Sundara Rajan Srinivasavaradhan and Sivakanth Gopi and Henry D. Pfister and Sergey Yekhanin},
  journal={2021 IEEE International Symposium on Information Theory (ISIT)},
Sequencing a DNA strand, as part of the read process in DNA storage, produces multiple noisy copies which can be combined to produce better estimates of the original strand; this is called trace reconstruction. One can reduce the error rate further by introducing redundancy in write sequence and this is called coded trace reconstruction. In this paper, we model the DNA storage channel as an insertion-deletion-substitution (IDS) channel and design both encoding schemes and low-complexity… Expand

Figures from this paper

Deep DNA Storage: Scalable and Robust DNA Storage via Coding Theory and Deep Learning
This work proposes a robust, efficient and scalable solution to implement DNA-based storage systems which deploys Deep Neural Networks (DNN) which reconstructs a sequence of letters based on imperfect cluster of copies generated by the synthesis and sequencing processes. Expand
Single-Read Reconstruction for DNA Data Storage Using Transformers
This work proposes a novel approach for single-read reconstruction using an encoder-decoder Transformer architecture for DNA based data storage and achieves lower error rates when reconstructing the original data from a single read of each DNA strand compared to state-of-the-art algorithms using 2-3 copies. Expand


Coded Trace Reconstruction
This work begins the study of coded trace reconstruction, the design and analysis of high-rate efficiently encodable codes that can be efficiently decoded with high probability from few reads corrupted by edit errors, and begins by analyzing marker-based code-constructions coupled with worst-case trace reconstruction algorithms. Expand
Reconstruction Algorithms for DNA-Storage Systems
This work presents several new algorithms for DNA reconstruction problems that look globally on the entire sequence of the traces and use dynamic programming algorithms, which are used for the shortest common supersequence and the longest common subsequence problems, in order to decode the original sequence. Expand
Achieving the Capacity of the DNA Storage Channel
Here it is proved the achievability of a recently published upper bound on the Shannon capacity of this channel for a large range of parameters by proposing and analyzing a decoder that clusters received strands according to their similarity and then efficiently estimates the original strands based on these clusters. Expand
Overcoming High Nanopore Basecaller Error Rates for DNA Storage Via Basecaller-Decoder Integration and Convolutional Codes
This work proposes a novel approach which overcomes the high error rates in basecalled sequences by integrating a Viterbi error correction decoder with the basecaller, enabling the decoder to exploit the soft information available in the deep learning based base caller pipeline. Expand
Low cost DNA data storage using photolithographic synthesis and advanced information reconstruction and error correction
This work demonstrates a DNA storage system that relies on massively parallel light-directed synthesis, which is considerably cheaper than conventional solid-phase synthesis, however, this technology has a high sequence error rate when optimized for speed. Expand
Portable and Error-Free DNA-Based Data Storage
This work represents the only known random access DNA-based data storage system that uses error-prone MinION sequencers and produces error-free readouts with the highest reported information rate and density. Expand
DNA-Based Storage: Models and Fundamental Limits
This work introduces a new channel model, which it is called the noisy shuffling-sampling channel, which captures three key distinctive aspects of DNA storage systems: (1) the data is written onto many short DNA molecules; (2) the molecules are corrupted by noise during synthesis and sequencing and (3) theData is read by randomly sampling from the DNA pool. Expand
Algorithms for Reconstruction Over Single and Multiple Deletion Channels
It is shown that solving for the ML estimate over the single deletion channel is equivalent to solving its relaxation, a continuous optimization problem, and the symbolwise posterior distributions are exactly computed for both the single as well as multiple deletion channels. Expand
Concatenated Codes for Recovery From Multiple Reads of DNA Sequences
This paper proposes two new decoding algorithms for inference from multiple received sequences, both combining the inner code and channel to a joint hidden Markov model to infer symbolwise a posteriori probabilities (APPs). Expand
Robust chemical preservation of digital information on DNA in silica with error-correcting codes.
The original information could be recovered error free, even after treating the DNA in silica at 70 °C for one week, which is thermally equivalent to storing information on DNA in central Europe for 2000 years. Expand