A Two-Level Scheme for Quality Score Compression
@article{Voges2018ATS,
title={A Two-Level Scheme for Quality Score Compression},
author={Jan Voges and Ali Reza Fotouhi and J{\"o}rn Ostermann and M. Oguzhan K{\"u}lekci},
journal={Journal of computational biology : a journal of computational molecular cell biology},
year={2018},
volume={25 10},
pages={
1141-1151
}
}Previous studies on quality score compression can be classified into two main lines: lossy schemes and lossless schemes. Lossy schemes enable a better management of computational resources. Thus, in practice, and for preliminary analyses, bioinformaticians may prefer to work with a lossy quality score representation. However, the original quality scores might be required for a deeper analysis of the data. Hence, it might be necessary to keep them; in addition to lossy compression this requires…
One Citation
On the relevance of quality score metadata in genomic sequence data for omics applications
- Biology
- 2019
It is discovered that it is possible to compute a threshold value for transparent quality score distortion in sequence alignment, allowing the identification of a “safe” representation for the quality score scale, which align with current trends in sequencing platforms pushing for coarser resolutions to reduce the storage footprint of sequence data.
References
SHOWING 1-10 OF 42 REFERENCES
QualComp: a new lossy compressor for quality scores based on rate distortion theory
- Computer ScienceBMC Bioinformatics
- 2012
This paper presents a new scheme for the lossy compression of the quality scores, to address the problem of storage and shows that it is possible to achieve a significant reduction in size with little compromise in performance on downstream applications (e.g., alignment).
A Cluster-Based Approach to Compression of Quality Scores
- Computer Science2016 Data Compression Conference (DCC)
- 2016
A new lossy compressor is proposed that first performs a clustering step, by assuming all the quality scores sequences come from a mixture of Markov models, and outperforms the previously proposed methods under all analyzed distortion metrics.
Transformations for the compression of FASTQ quality scores of next-generation sequencing data
- Computer ScienceBioinform.
- 2012
Experiments show that both lossy and lossless transformations are useful, and that simple coding methods, which consume less computing resources, are highly competitive, especially when random access to reads is needed.
Lossy compression of quality scores in genomic data
- Computer ScienceBioinform.
- 2014
This work presents existing compression options for quality score data, and introduces two new lossy techniques that are demonstrably superior to other techniques when assessed against the spectrum of possible trade-offs between storage required and fidelity of representation.
CALQ: compression of quality values of aligned sequencing data
- Computer ScienceBioinform.
- 2018
This work presents a novel lossy compression scheme named CALQ, which performs as good as or better than the state‐of‐the‐art lossy compressors in terms of variant calling Recall and Precision for most of the analyzed datasets.
Adaptive reference-free compression of sequence quality scores
- Computer ScienceBioinform.
- 2014
By aggregating a set of reads into a compressed index, it is found that the majority of bases can be predicted from the sequence of bases that are adjacent to them and, hence, are likely to be less informative for variant calling or other applications.
Effect of lossy compression of quality scores on variant calling
- Computer ScienceBriefings Bioinform.
- 2017
It is shown that lossy compression can significantly alleviate the storage while maintaining variant calling performance comparable to that with the original data, and in some cases lossy compressed can lead to variantCalling performance that is superior to that using the original file.
An Evaluation Framework for Lossy Compression of Genome Sequencing Quality Values
- Computer Science2016 Data Compression Conference (DCC)
- 2016
The specification and an initial validation of an evaluation framework for the comparison of lossy compressors of genome sequencing quality values and the functionality of the framework is validated referring to two state-of-the-art genomic compressors.
SCALCE: boosting sequence compression algorithms using locally consistent encoding
- Computer ScienceBioinform.
- 2012
SCALCE, a 'boosting' scheme based on Locally Consistent Parsing technique, which reorganizes the reads in a way that results in a higher compression speed and compression rate, independent of the compression algorithm in use and without using a reference genome is presented.
Quality score compression improves genotyping accuracy
- Computer ScienceNature Biotechnology
- 2015
This Correspondence recovers quality information directly from sequence data using a compression tool named Quartz, rendering such scores redundant and yielding substantially better space and time efficiencies for storage and analysis.





