Z-checker: A framework for assessing lossy compression of scientific data
@article{Tao2019ZcheckerAF, title={Z-checker: A framework for assessing lossy compression of scientific data}, author={Dingwen Tao and Sheng Di and Hanqi Guo and Zizhong Chen and Franck Cappello}, journal={The International Journal of High Performance Computing Applications}, year={2019}, volume={33}, pages={285 - 303} }
Because of the vast volume of data being produced by today’s scientific simulations and experiments, lossy data compressor allowing user-controlled loss of accuracy during the compression is a relevant solution for significantly reducing the data size. However, lossy compressor developers and users are missing a tool to explore the features of scientific data sets and understand the data alteration after compression in a systematic and reliable way. To address this gap, we have designed and…
Figures and Tables from this paper
29 Citations
SDRBench: Scientific Data Reduction Benchmark for Lossy Compressors
- Computer Science2020 IEEE International Conference on Big Data (Big Data)
- 2020
A standard compression assessment benchmark – Scientific Data Reduction Benchmark (SDRBench) is established, which contains a vast variety of real-world scientific datasets across different domains, summarizes several critical compression quality evaluation metrics, and integrates many state-of-the-art lossy and lossless compressors.
Evaluation of lossless and lossy algorithms for the compression of scientific datasets in netCDF-4 or HDF5 files
- Computer ScienceGeoscientific Model Development
- 2019
This study evaluates lossy and lossless compression/decompression methods through netCDF-4 and HDF5 tools on analytical and real scientific floating-point datasets and introduces the Digit Rounding algorithm, a new relative error-bounded data reduction method inspired by the Bit Grooming algorithm.
Feature-preserving Lossy Compression for In Situ Data Analysis
- Computer ScienceICPP Workshops
- 2020
It is shown that the optimal choice of compression parameters varies with data, time, and analysis, and that periodic retuning of the in situ pipeline can improve compression quality, and on the wider adoption of in situ data analysis and management practices and technologies in the HPC community.
Supplement of Evaluation of lossless and lossy algorithms for the compression of scientific datasets in netCDF-4 or HDF 5 files
- Computer Science
- 2019
This study evaluates lossy and lossless compression/decompression methods through netCDF-4 and HDF5 tools on analytical and real scientific floating-point datasets and introduces the Digit Rounding algorithm, a new relative error-bounded data reduction method inspired by the Bit Grooming algorithm.
Efficient Encoding and Reconstruction of HPC Datasets for Checkpoint/Restart
- Computer Science2019 35th Symposium on Mass Storage Systems and Technologies (MSST)
- 2019
This work applies a discrete cosine transform with a novel block decomposition strategy directly to double-precision floating point datasets instead of prevailing prediction-based techniques, showing comparable performance with state-of-the-art lossy compression methods, SZ and ZFP.
SZx: an Ultra-fast Error-bounded Lossy Compressor for Scientific Datasets
- Computer ScienceArXiv
- 2022
A novel, generic ultra-fast errorbounded lossy compression framework – called UFZ, which can obtain fairly high compression performance on both CPU and GPU, also with reasonably high compression ratios.
Significantly improving lossy compression quality based on an optimized hybrid prediction model
- Computer ScienceSC
- 2019
This paper proposes a novel, transform-based predictor and optimize its compression quality and significantly improves the coefficient-encoding efficiency for the data-fitting predictor, and proposes an adaptive framework that can select the best-fit predictor accurately for different datasets.
Optimizing Lossy Compression Rate-Distortion from Automatic Online Selection between SZ and ZFP
- Computer ScienceIEEE Transactions on Parallel and Distributed Systems
- 2019
This paper investigates the principles of SZ and ZFP and proposes an efficient online, low-overhead selection algorithm that predicts the compression quality accurately for two compressors in early processing stages and selects the best-fit compressor for each data field.
State of the Art and Future Trends in Data Reduction for High-Performance Computing
- Computer ScienceSupercomput. Front. Innov.
- 2020
An overview of leveraging points found in high-performance computing (HPC) systems and suitable mechanisms to reduce data volumes and their respective usage at the application and file system layer is provided.
Bit-Error Aware Quantization for DCT-based Lossy Compression
- Computer Science2020 IEEE High Performance Extreme Computing Conference (HPEC)
- 2020
This paper proposes a bit-efficient quantizer based on the DCTZ framework, develops a unique ordering mechanismbased on the quantization table, and extends the encoding index, which can improve the compression ratio of the original DCTz by 1.38x.
References
SHOWING 1-10 OF 35 REFERENCES
Exploration of Lossy Compression for Application-Level Checkpoint/Restart
- Computer Science2015 IEEE International Parallel and Distributed Processing Symposium
- 2015
A loss compression technique based on wavelet transformation for checkpoints is proposed, and its impact to application results is explored to show that the overall checkpoint time including compression is reduced, while relative error remains fairly constant.
Fast Error-Bounded Lossy HPC Data Compression with SZ
- Computer Science2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- 2016
This paper proposes a novel HPC data compression method that works very effectively on compressing large-scale HPCData sets, and evaluates it using 13 real-world HPC applications across different scientific domains, and compared to many other state-of-the-art compression methods.
Fast and Efficient Compression of Floating-Point Data
- Computer ScienceIEEE Transactions on Visualization and Computer Graphics
- 2006
This work proposes a simple scheme for lossless, online compression of floating-point data that transparently integrates into the I/O of many applications, and achieves state-of-the-art compression rates and speeds.
Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization
- Computer Science2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- 2017
This work design a new error-controlled lossy compression algorithm for large-scale scientific data, significantly improving the prediction hitting rate (or prediction accuracy) for each data point based on its nearby data values along multiple dimensions, and derives a series of multilayer prediction formulas and their unified formula in the context of data compression.
NUMARCK: Machine Learning Algorithm for Resiliency and Checkpointing
- Computer ScienceSC14: International Conference for High Performance Computing, Networking, Storage and Analysis
- 2014
NUMARCK, North western University Machine learning Algorithm for Resiliency and Check pointing, is proposed that makes use of the emerging distributions of data changes between consecutive simulation iterations and encodes them into an indexing space that can be concisely represented.
ISABELA for effective in situ compression of scientific data
- Computer ScienceConcurr. Comput. Pract. Exp.
- 2013
The random nature of real‐valued scientific datasets renders lossless compression routines ineffective, and these techniques also impose significant overhead during decompression, making them unsuitable for data analysis and visualization, which require repeated data access.
Fast lossless compression of scientific floating-point data
- Computer ScienceData Compression Conference (DCC'06)
- 2006
A new compression algorithm that is tailored to scientific computing environments where large amounts of floating-point data often need to be transferred between computers as well as to and from storage devices is described and evaluated.
Evaluating lossy data compression on climate simulation data within a large ensemble
- Environmental Science, Computer Science
- 2014
This paper reports on the results of a lossy data compression experiment with output from the CESM Large Ensemble (CESM-LE) Community Project, in which climate scientists are challenged to examine features of the data relevant to their interests, and to identify which of the ensemble members have been compressed and reconstructed.
Universal Numerical Encoder and Profiler Reduces Computing's Memory Wall with Software, FPGA, and SoC Implementations
- Computer Science2013 Data Compression Conference
- 2013
The computationally efficient and adaptive APplication AXceleration (APAX) numerical encoding method to reduce the memory wall for integers and floating-point operands and quantifies the degree of uncertainty (accuracy) in numerical datasets.
A methodology for evaluating the impact of data compression on climate simulation data
- Environmental Science, Computer ScienceHPDC '14
- 2014
It is found that the diversity of the climate data requires the individual treatment of variables, and, in doing so, the reconstructed data can fall within the natural variability of the system, while achieving compression rates of up to 5:1.