• Corpus ID: 226254456

Datasets for Benchmarking Floating-Point Compressors

@article{Knorr2020DatasetsFB,
  title={Datasets for Benchmarking Floating-Point Compressors},
  author={Fabian Knorr and Peter Thoman and Thomas Fahringer},
  journal={ArXiv},
  year={2020},
  volume={abs/2011.02849}
}
Compression of floating-point data, both lossy and lossless, is a topic of increasing interest in scientific computing. Developing and evaluating suitable compression algorithms requires representative samples of data from real-world applications. We present a collection of publicly accessible sources for volume and time series data as well as a list of concrete datasets that form an adequate basis for compressor benchmarking. 

Figures from this paper

ndzip: A High-Throughput Parallel Lossless Compressor for Scientific Data
TLDR
Ndzip is proposed, a high-throughput, lossless compression algorithm for multi-dimensional univariate regular grids of single- and double-precision floating point data that compresses and decompresses data at speeds close to main memory bandwidth, significantly outperforming existing schemes.
ndzip-gpu: efficient lossless compression of scientific floating-point data on GPUs
TLDR
Ndzip-gpu, a novel, highly-efficient GPU parallelization scheme for the block compressor ndzip, which has recently set a new milestone in CPU floating-point compression speeds, is presented and it is demonstrated that nd zip-gpu offers the best average compression ratio for the examined data.
CEAZ: accelerating parallel I/O via hardware-algorithm co-designed adaptive lossy compression
TLDR
This paper proposes an efficient Huffman coding approach that can adaptively update Huffman codewords online based oncodewords generated offline, from a variety of representative scientific datasets, and derives a theoretical analysis to support a precise control of compression ratio under an error-bounded compression mode.

References

SHOWING 1-10 OF 23 REFERENCES
SPDP: An Automatically Synthesized Lossless Compression Algorithm for Floating-Point Data
TLDR
Over nine million algorithms were generated and the resulting algorithm, called SPDP, comprises four data transformations that operate exclusively at word or byte granularity, and reveals how to build effective compression algorithms for scientific data.
FPC: A High-Speed Compressor for Double-Precision Floating-Point Data
TLDR
FPC is described and evaluated, a fast lossless compression algorithm for linear streams of 64-bit floating-point data that works well on hard-to-compress scientific data sets and meets the throughput demands of high-performance systems.
Fast and Efficient Compression of Floating-Point Data
TLDR
This work proposes a simple scheme for lossless, online compression of floating-point data that transparently integrates into the I/O of many applications, and achieves state-of-the-art compression rates and speeds.
An Adaptive Prediction-Based Approach to Lossless Compression of Floating-Point Volume Data
  • Nathaniel Fout, K. Ma
  • Computer Science
    IEEE Transactions on Visualization and Computer Graphics
  • 2012
TLDR
The results demonstrate that the polynomial predictor, APE, is comparable to previous approaches in terms of speed but achieves better compression rates on average, and ACE, the combined predictor, is able to achieve the best compression rate on all datasets, with significantly better rates on most of the datasets.
MPC: A Massively Parallel Compression Algorithm for Scientific Data
TLDR
The Massively Parallel Compression (MPC) algorithm is derived, which requires almost no internal state, achieves heretofore unreached compression ratios on several data sets, and roughly matches the best CPU-based algorithms in compression ratio while outperforming them by one to two orders of magnitude in throughput.
Out‐of‐core compression and decompression of large n‐dimensional scalar fields
TLDR
A simple method for compressing very large and regularly sampled scalar fields based on the new Lorenzo predictor, which is well suited for out‐of‐core compression and decompression and often outperforms wavelet compression in an L∞sense.
Celerity: High-Level C++ for Accelerator Clusters
In the face of ever-slowing single-thread performance growth for CPUs, the scientific and engineering communities increasingly turn to accelerator parallelization to tackle growing application
RTX-RSim: Accelerated Vulkan Room Response Simulation for Time-of-Flight Imaging
TLDR
This paper presents a new room impulse simulation method, implemented with Vulkan compute shaders and leveraging NVIDIA VKRay hardware raytracing, and extends this method to asynchronous streaming in order to overcome the limitations of on-board GPU memory when simulating very large scenes.
The Frontier Fields: Survey Design and Initial Results
What are the faintest distant galaxies we can see with the Hubble Space Telescope (HST) now, before the launch of the James Webb Space Telescope? This is the challenge taken up by the Frontier
...
...