Capacity Results for the Noisy Shuffling Channel

@article{Shomorony2019CapacityRF,
  title={Capacity Results for the Noisy Shuffling Channel},
  author={Ilan Shomorony and Reinhard Heckel},
  journal={2019 IEEE International Symposium on Information Theory (ISIT)},
  year={2019},
  pages={762-766}
}
Motivated by DNA-based storage, we study the noisy shuffling channel, which can be seen as the concatenation of a standard noisy channel (such as the BSC) and a shuffling channel, which breaks the data block into small pieces and shuffles them. This channel models a DNA storage system, by capturing two of its key aspects: (1) the data is written onto many short DNA molecules that are stored in an unordered way and (2) the molecules are corrupted by noise at synthesis, sequencing, and during… 

Figures from this paper

Achieving the Capacity of a DNA Storage Channel with Linear Coding Schemes
TLDR
This work considers a multi-draw DNA storage channel in the setting of noise corruption by a binary erasure channel and shows that, in this setting, the capacity is achieved by linear coding schemes.
Capacity of the Erasure Shuffling Channel
TLDR
The erasure shuffling channel is studied, which takes as input multiple strings, which are passed through an erasure channel and then shuffled out of order, to show that the capacity is given by the capacity of the binary erasure channels, CBEC minus a term that captures the loss of ordering information due to shuffling.
DNA-Based Storage: Models and Fundamental Limits
TLDR
This work introduces a new channel model, which it is called the noisy shuffling-sampling channel, which captures three key distinctive aspects of DNA storage systems: (1) the data is written onto many short DNA molecules; (2) the molecules are corrupted by noise during synthesis and sequencing and (3) theData is read by randomly sampling from the DNA pool.
An Upper Bound on the Capacity of the DNA Storage Channel
TLDR
An information theoretic study of the storage channel-the entity that formulates the relation between stored and sequenced strands is conducted, and an upper bound on the Shannon capacity of the channel is derived.
Achievable Rates of Concatenated Codes in DNA Storage under Substitution Errors
TLDR
A modified concatenated coding scheme is proposed by combining several strands into one inner block, which allows to narrow the gap and achieve rates that are close to the capacity.
Achieving the Capacity of the DNA Storage Channel
TLDR
Here it is proved the achievability of a recently published upper bound on the Shannon capacity of this channel for a large range of parameters by proposing and analyzing a decoder that clusters received strands according to their similarity and then efficiently estimates the original strands based on these clusters.
Reassembly Codes for the Chop-and-Shuffle Channel
TLDR
The results show that the decoding error decreases as the input length increases, and the method has a significantly lower complexity than the baseline brute-force approach.
On Coding Over Sliced Information
TLDR
The surprising result in this paper is that while the information vector is sliced into a set of unordered strings, the amount of redundant bits that are required to correct errors is order-wise equivalent to the amount required in the classical error correcting paradigm.
Robust Indexing - Optimal Codes for DNA Storage
TLDR
This paper presents an order-wise optimal construction of codes that correct multiple substitution errors for this channel model and uses robust indexing: instead of using fixed indices to create order in unordered strings, they use indices that are information dependent and thus eliminate unnecessary redundancy.
Coding Theorems for Noisy Permutation Channels
  • A. Makur
  • Computer Science, Mathematics
    IEEE Transactions on Information Theory
  • 2020
TLDR
The achievability proof yields a conceptually simple, computationally efficient, and capacity achieving coding scheme for such DMCs, and the results suggest that noisy permutation channel capacities are generally quite agnostic to the parameters that define the D MCs.
...
1
2
3
4
...

References

SHOWING 1-10 OF 20 REFERENCES
A Characterization of the DNA Data Storage Channel
TLDR
It is found that errors within molecules are mainly due to synthesis and sequencing, while imperfections in handling and storage lead to a significant loss of sequences.
Coding over Sets for DNA Storage
TLDR
By deriving upper bounds on the cardinalities of these codes using sphere packing arguments, it is shown that many of the codes are close to optimal.
Fundamental limits of DNA storage systems
TLDR
Under this model, the storage capacity of DNA-based storage systems under a simple model is characterized, and it is shown that a simple index-based coding scheme is optimal.
A Rewritable, Random-Access DNA-Based Storage System
TLDR
The first DNA-based storage architecture that enables random access to data blocks and rewriting of information stored at arbitrary locations within the blocks is described, which suggests that DNA is a versatile media suitable for both ultrahigh density archival and rewritable storage applications.
Codes in the Space of Multisets—Coding for Permutation Channels With Impairments
TLDR
The construction based on the notion of Sidon sets in finite Abelian groups is shown to be optimal, in the sense of the asymptotic scaling of code redundancy, for any error radius and alphabet size.
Sequence-Subset Distance and Coding for Error Control for DNA-based Data Storage
  • Wentu Song, K. Cai
  • Computer Science
    2019 IEEE International Symposium on Information Theory (ISIT)
  • 2019
TLDR
Some upper bounds on the size of the sequence-subset codes are derived including a tight bound for a special case and a Singleton-like bound, and some constructions of such codes are presented.
Codes for DNA Sequence Profiles
TLDR
This work introduces the DNA storage channel and model the read process through the use of profile vectors and proposes new asymmetric coding techniques to combat the effects of synthesis and sequencing noise.
DNA Fountain enables a robust and efficient storage architecture
TLDR
A storage strategy that is highly robust and approaches the information capacity per nucleotide, and a perfect retrieval from a density of 215 petabytes per gram of DNA, orders of magnitude higher than previous reports are reported.
Anchor-Based Correction of Substitutions in Indexed Sets
TLDR
This paper proposes a construction that efficiently deals with the challenges that arise when designing codes for unordered sets, and suggests that it requires less redundancy to correct errors in the indices than in the data part of vectors.
Towards practical, high-capacity, low-maintenance information storage in synthesized DNA
TLDR
Theoretical analysis indicates that the DNA-based storage scheme could be scaled far beyond current global information volumes and offers a realistic technology for large-scale, long-term and infrequently accessed digital archiving.
...
1
2
...