Pareto Optimal Compression of Genomic Dictionaries, with or without Random Access in Main Memory

  title={Pareto Optimal Compression of Genomic Dictionaries, with or without Random Access in Main Memory},
  author={Raffaele Giancarlo and Gennaro Grimaudo},

Figures and Tables from this paper



Compressive biological sequence analysis and archival in the era of high-throughput sequencing technologies

This review provides a classification of the main techniques that have been proposed, according to three specific research directions that have emerged from the literature and, for each, an overview of the current techniques.

Data Compression: Methods and Theory

Books and internet are the recommended media to help you improving your quality and performance.

Techniques for Inverted Index Compression

The aim of this article is surveying the encoding algorithms suitable for inverted index compression and characterizing the performance of the inverted index through experimentation.

From Theory to Practice: Plug and Play with Succinct Data Structures

This paper presents a framework for experimentation with succinct data structures, providing a large set of configurable components, together with tests, benchmarks, and tools to analyze resource requirements.

MFCompress: a compression tool for FASTA and multi-FASTA data

MFCompress is described, specially designed for the compression of FASTA and multi-FASTA files, which can provide additional average compression gains of almost 50%, and potentially doubles the available storage, although at the cost of some more computation time.

Bicriteria data compression

The Bicriteria LZ77-Parsing problem is introduced, which formalizes in a principled way what data-compressors have traditionally approached by means of heuristics, and solves this problem efficiently in O(n log^2 n) time and optimal linear space within a small, additive approximation.

Comparison of high-throughput sequencing data compression tools

A benchmarking study of available compression methods on a comprehensive set of HTS data using an automated framework to report on the development of compression methods for high-throughput sequencing data size reduction.

Binary Interpolative Coding for Effective Index Compression

A new method for compressing inverted indexes is introduced that yields excellent compression, fast decoding, and exploits clustering—the tendency for words to appear relatively frequently in some parts of the collection and infrequently in others.

Space-efficient representation of genomic k-mer count tables

This work designs an efficient representation of k -mer count tables supporting fast random-access queries, and proposes to apply Compressed Static Functions (CSFs), with space proportional to the empirical zero-order entropy of the counts.

Simplitigs as an efficient and scalable representation of de Bruijn graphs

Simplitigs are introduced as a compact, efficient, and scalable representation, and ProphAsm, a fast algorithm for their computation, and it is demonstrated that simplitigs provide a substantial improvement in the cumulative sequence length and their number.