Pareto Optimal Compression of Genomic Dictionaries, with or without Random Access in Main Memory
@article{Giancarlo2022ParetoOC, title={Pareto Optimal Compression of Genomic Dictionaries, with or without Random Access in Main Memory}, author={Raffaele Giancarlo and Gennaro Grimaudo}, journal={ArXiv}, year={2022}, volume={abs/2212.03067} }
,
References
SHOWING 1-10 OF 39 REFERENCES
Compressive biological sequence analysis and archival in the era of high-throughput sequencing technologies
- Computer ScienceBriefings Bioinform.
- 2014
This review provides a classification of the main techniques that have been proposed, according to three specific research directions that have emerged from the literature and, for each, an overview of the current techniques.
Data Compression: Methods and Theory
- Medicine
- 1987
Books and internet are the recommended media to help you improving your quality and performance.
Techniques for Inverted Index Compression
- Computer ScienceACM Comput. Surv.
- 2021
The aim of this article is surveying the encoding algorithms suitable for inverted index compression and characterizing the performance of the inverted index through experimentation.
From Theory to Practice: Plug and Play with Succinct Data Structures
- Computer ScienceSEA
- 2014
This paper presents a framework for experimentation with succinct data structures, providing a large set of configurable components, together with tests, benchmarks, and tools to analyze resource requirements.
MFCompress: a compression tool for FASTA and multi-FASTA data
- Computer ScienceBioinform.
- 2014
MFCompress is described, specially designed for the compression of FASTA and multi-FASTA files, which can provide additional average compression gains of almost 50%, and potentially doubles the available storage, although at the cost of some more computation time.
Bicriteria data compression
- Computer ScienceSODA
- 2014
The Bicriteria LZ77-Parsing problem is introduced, which formalizes in a principled way what data-compressors have traditionally approached by means of heuristics, and solves this problem efficiently in O(n log^2 n) time and optimal linear space within a small, additive approximation.
Comparison of high-throughput sequencing data compression tools
- Biology, PhysicsNature Methods
- 2016
A benchmarking study of available compression methods on a comprehensive set of HTS data using an automated framework to report on the development of compression methods for high-throughput sequencing data size reduction.
Binary Interpolative Coding for Effective Index Compression
- Computer ScienceInformation Retrieval
- 2004
A new method for compressing inverted indexes is introduced that yields excellent compression, fast decoding, and exploits clustering—the tendency for words to appear relatively frequently in some parts of the collection and infrequently in others.
Space-efficient representation of genomic k-mer count tables
- Computer ScienceWABI
- 2021
This work designs an efficient representation of k -mer count tables supporting fast random-access queries, and proposes to apply Compressed Static Functions (CSFs), with space proportional to the empirical zero-order entropy of the counts.
Simplitigs as an efficient and scalable representation of de Bruijn graphs
- BiologybioRxiv
- 2020
Simplitigs are introduced as a compact, efficient, and scalable representation, and ProphAsm, a fast algorithm for their computation, and it is demonstrated that simplitigs provide a substantial improvement in the cumulative sequence length and their number.