• Corpus ID: 17085792

Improving the Speed of LZ77 Compression by Hashing and Suffix Sorting

@article{Sadakane2000ImprovingTS,
  title={Improving the Speed of LZ77 Compression by Hashing and Suffix Sorting},
  author={Kunihiko Sadakane and Hiroshi Imai},
  journal={IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences},
  year={2000},
  volume={83},
  pages={2689-2698}
}
  • K. SadakaneH. Imai
  • Published 1 December 2000
  • Computer Science
  • IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Two new algorithms for improving the speed of the LZ77 compression are proposed. One is based on a new hashing algorithm named two-level hashing that enables fast longest match searching from a sliding dictionary, and the other uses suffix sorting. The former is suitable for small dictionaries and it significantly improves the speed of gzip, which uses a naive hashing algorithm. The latter is suitable for large dictionaries which improve compression ratio for large files. We also experiment on… 

Figures and Tables from this paper

A Complete Suffix Array-Based String Match Search Algorithm of Sliding Windows

  • Lu WangKun HuangJian ZhangJin Yao
  • Computer Science
    2012 Fifth International Symposium on Computational Intelligence and Design
  • 2012
The concept of completed suffix array and a new sliding window search algorithm for string matching using the orderly construction of suffix arrays are proposed to make it unnecessary that suffix array is rebuilt every time.

Parallel Decompression of Gzip-Compressed Files and Random Access to DNA Sequences

  • Mael KerbiriouR. Chikhi
  • Computer Science
    2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
  • 2019
A parallel algorithm and an implementation that performs fast and exact decompression of any text file, pugz, is proposed and shown to be an order of magnitude faster than gunzip, and 5x faster than a highly-optimized sequential implementation (libdeflate).

Long Distance Redundancy Reduction in Thin Client Computing

  • Sun-Jin YangT. T. Tiow
  • Computer Science
    6th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2007)
  • 2007
This paper presents a different way to extend the history buffer and a history extension scheme based on it, aiming at minimizing long distance redundancies, and empirically studied the effectiveness of this scheme on some screen updates generated by one of the most bandwidth-efficient thin client system, Microsoft Terminal Service.

Secure Lempel-Ziv compression with embedded encryption

An encryption scheme called the Randomized Dictionary Table (RDT), which embeds encryption into the LZ78 data compression method, is proposed and analyzed and achieves high security strength under both the ciphertext only attack and the known/chosen plaintext attack.

Parallel I/O on Compressed Data Files: Semantics, Algorithms, and Performance Evaluation

  • S. SinghE. Gabriel
  • Computer Science
    2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID)
  • 2020
The paper details handling of individual read and write operations of compressed data files, and presents an extension to the two-phase collective I/O algorithm to support data compression, and demonstrates significant performance benefits when using data compression on a parallel BeeGFS file system.

Efficient compression of large repetitive strings

This research presents a meta-compression framework that automates the very labor-intensive and therefore time-heavy and expensive process of manually cataloging and extracting compressors from data.

Analysis and reduction of data spikes in thin client computing

Lossy dictionary-based image compression method

Research of an image map encoding algorithm on frame buffer

An original data encoding algorithm, image map encode algorithm, based on the traditional LZ77 algorithm is proposed, which can reduce the frequency of the access from LCD controller to frame buffer, thus decreasing the power of the whole system effectively.

Rapid Development of Gzip with MaxJ

The gzip design in MaxJ presented here took only one man-month to develop and achieved better performance than the related work created in Verilog and OpenCL.

References

SHOWING 1-10 OF 15 REFERENCES

Extended application of suffix trees to data compression

  • N. Larsson
  • Computer Science
    Proceedings of Data Compression Conference - DCC '96
  • 1996
It is shown that the scheme can be applied to PPM-style compression, obtaining an algorithm that runs in linear time, and in space bounded by an arbitrarily chosen window size.

Longest‐match string searching for ziv‐lempel compression

Eight data structures that can be used to accelerate the searching, including adaptations of four methods normally used for exact matching searching, are presented, indicating the trade‐offs available between compression speed and memory consumption.

Suffix arrays: a new method for on-line string searches

A new and conceptually simple data structure, called a suffixarray, for on-line string searches is introduced in this paper, and it is believed that suffixarrays will prove to be better in practice than suffixtrees for many applications.

The effect of non-greedy parsing in Ziv-Lempel compression methods

  • R. N. Horspool
  • Computer Science
    Proceedings DCC '95 Data Compression Conference
  • 1995
Practical implementations for using non-greedy parsing in LZ77 and LZ78 compression are explored and some experimental measurements are presented.

A corpus for the evaluation of lossless compression algorithms

  • R. ArnoldT. Bell
  • Computer Science
    Proceedings DCC '97. Data Compression Conference
  • 1997
A principled technique for collecting a corpus of test data for compression methods is developed, and a corpus, called the Canterbury corpus, is developed using this technique, and the performance of a collection of compression methods using the new corpus is reported.

A Block-sorting Lossless Data Compression Algorithm

A block-sorting, lossless data compression algorithm, and the implementation of that algorithm and the performance of the implementation with widely available data compressors running on the same hardware are compared.

Modifications of the Burrows and Wheeler data compression algorithm

Based on the context tree model, the specific statistical properties of the data at the output of the BWT are considered, which lead to modifications of the coding method, which improve the coding efficiency.

The sliding-window Lempel-Ziv algorithm is asymptotically optimal

The sliding-window version of the Lempel-Ziv data-compression algorithm is described, and it is shown that as the "window size," a quantity related to the memory and complexity of the procedure, goes to infinity, the compression rate approaches the source entropy.

A universal algorithm for sequential data compression

The compression ratio achieved by the proposed universal code uniformly approaches the lower bounds on the compression ratios attainable by block-to-variable codes and variable- to-block codes designed to match a completely specified source.

Fast hashing of variable-length text strings

Using only a few simple and commonplace instructions, this algorithm efficiently maps variable-length text strings small integers.