Z-checker: A framework for assessing lossy compression of scientific data

@article{Tao2019ZcheckerAF,
  title={Z-checker: A framework for assessing lossy compression of scientific data},
  author={Dingwen Tao and Sheng Di and Hanqi Guo and Zizhong Chen and Franck Cappello},
  journal={The International Journal of High Performance Computing Applications},
  year={2019},
  volume={33},
  pages={285 - 303}
}
  • Dingwen Tao, Sheng Di, F. Cappello
  • Published 12 June 2017
  • Computer Science
  • The International Journal of High Performance Computing Applications
Because of the vast volume of data being produced by today’s scientific simulations and experiments, lossy data compressor allowing user-controlled loss of accuracy during the compression is a relevant solution for significantly reducing the data size. However, lossy compressor developers and users are missing a tool to explore the features of scientific data sets and understand the data alteration after compression in a systematic and reliable way. To address this gap, we have designed and… 
SDRBench: Scientific Data Reduction Benchmark for Lossy Compressors
TLDR
A standard compression assessment benchmark – Scientific Data Reduction Benchmark (SDRBench) is established, which contains a vast variety of real-world scientific datasets across different domains, summarizes several critical compression quality evaluation metrics, and integrates many state-of-the-art lossy and lossless compressors.
Evaluation of lossless and lossy algorithms for the compression of scientific datasets in netCDF-4 or HDF5 files
TLDR
This study evaluates lossy and lossless compression/decompression methods through netCDF-4 and HDF5 tools on analytical and real scientific floating-point datasets and introduces the Digit Rounding algorithm, a new relative error-bounded data reduction method inspired by the Bit Grooming algorithm.
Feature-preserving Lossy Compression for In Situ Data Analysis
TLDR
It is shown that the optimal choice of compression parameters varies with data, time, and analysis, and that periodic retuning of the in situ pipeline can improve compression quality, and on the wider adoption of in situ data analysis and management practices and technologies in the HPC community.
Supplement of Evaluation of lossless and lossy algorithms for the compression of scientific datasets in netCDF-4 or HDF 5 files
TLDR
This study evaluates lossy and lossless compression/decompression methods through netCDF-4 and HDF5 tools on analytical and real scientific floating-point datasets and introduces the Digit Rounding algorithm, a new relative error-bounded data reduction method inspired by the Bit Grooming algorithm.
Efficient Encoding and Reconstruction of HPC Datasets for Checkpoint/Restart
TLDR
This work applies a discrete cosine transform with a novel block decomposition strategy directly to double-precision floating point datasets instead of prevailing prediction-based techniques, showing comparable performance with state-of-the-art lossy compression methods, SZ and ZFP.
SZx: an Ultra-fast Error-bounded Lossy Compressor for Scientific Datasets
TLDR
A novel, generic ultra-fast errorbounded lossy compression framework – called UFZ, which can obtain fairly high compression performance on both CPU and GPU, also with reasonably high compression ratios.
Significantly improving lossy compression quality based on an optimized hybrid prediction model
TLDR
This paper proposes a novel, transform-based predictor and optimize its compression quality and significantly improves the coefficient-encoding efficiency for the data-fitting predictor, and proposes an adaptive framework that can select the best-fit predictor accurately for different datasets.
Optimizing Lossy Compression Rate-Distortion from Automatic Online Selection between SZ and ZFP
TLDR
This paper investigates the principles of SZ and ZFP and proposes an efficient online, low-overhead selection algorithm that predicts the compression quality accurately for two compressors in early processing stages and selects the best-fit compressor for each data field.
State of the Art and Future Trends in Data Reduction for High-Performance Computing
TLDR
An overview of leveraging points found in high-performance computing (HPC) systems and suitable mechanisms to reduce data volumes and their respective usage at the application and file system layer is provided.
Bit-Error Aware Quantization for DCT-based Lossy Compression
TLDR
This paper proposes a bit-efficient quantizer based on the DCTZ framework, develops a unique ordering mechanismbased on the quantization table, and extends the encoding index, which can improve the compression ratio of the original DCTz by 1.38x.
...
1
2
3
...

References

SHOWING 1-10 OF 35 REFERENCES
Exploration of Lossy Compression for Application-Level Checkpoint/Restart
TLDR
A loss compression technique based on wavelet transformation for checkpoints is proposed, and its impact to application results is explored to show that the overall checkpoint time including compression is reduced, while relative error remains fairly constant.
Fast Error-Bounded Lossy HPC Data Compression with SZ
  • Sheng Di, F. Cappello
  • Computer Science
    2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • 2016
TLDR
This paper proposes a novel HPC data compression method that works very effectively on compressing large-scale HPCData sets, and evaluates it using 13 real-world HPC applications across different scientific domains, and compared to many other state-of-the-art compression methods.
Fast and Efficient Compression of Floating-Point Data
TLDR
This work proposes a simple scheme for lossless, online compression of floating-point data that transparently integrates into the I/O of many applications, and achieves state-of-the-art compression rates and speeds.
Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization
TLDR
This work design a new error-controlled lossy compression algorithm for large-scale scientific data, significantly improving the prediction hitting rate (or prediction accuracy) for each data point based on its nearby data values along multiple dimensions, and derives a series of multilayer prediction formulas and their unified formula in the context of data compression.
NUMARCK: Machine Learning Algorithm for Resiliency and Checkpointing
TLDR
NUMARCK, North western University Machine learning Algorithm for Resiliency and Check pointing, is proposed that makes use of the emerging distributions of data changes between consecutive simulation iterations and encodes them into an indexing space that can be concisely represented.
ISABELA for effective in situ compression of scientific data
TLDR
The random nature of real‐valued scientific datasets renders lossless compression routines ineffective, and these techniques also impose significant overhead during decompression, making them unsuitable for data analysis and visualization, which require repeated data access.
Fast lossless compression of scientific floating-point data
TLDR
A new compression algorithm that is tailored to scientific computing environments where large amounts of floating-point data often need to be transferred between computers as well as to and from storage devices is described and evaluated.
Evaluating lossy data compression on climate simulation data within a large ensemble
TLDR
This paper reports on the results of a lossy data compression experiment with output from the CESM Large Ensemble (CESM-LE) Community Project, in which climate scientists are challenged to examine features of the data relevant to their interests, and to identify which of the ensemble members have been compressed and reconstructed.
Universal Numerical Encoder and Profiler Reduces Computing's Memory Wall with Software, FPGA, and SoC Implementations
  • Al Wegener
  • Computer Science
    2013 Data Compression Conference
  • 2013
TLDR
The computationally efficient and adaptive APplication AXceleration (APAX) numerical encoding method to reduce the memory wall for integers and floating-point operands and quantifies the degree of uncertainty (accuracy) in numerical datasets.
A methodology for evaluating the impact of data compression on climate simulation data
TLDR
It is found that the diversity of the climate data requires the individual treatment of variables, and, in doing so, the reconstructed data can fall within the natural variability of the system, while achieving compression rates of up to 5:1.
...
1
2
3
4
...