# Improving the Speed of LZ77 Compression by Hashing and Suffix Sorting

@article{Sadakane2000ImprovingTS, title={Improving the Speed of LZ77 Compression by Hashing and Suffix Sorting}, author={Kunihiko Sadakane and Hiroshi Imai}, journal={IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences}, year={2000}, volume={83}, pages={2689-2698} }

Two new algorithms for improving the speed of the LZ77 compression are proposed. One is based on a new hashing algorithm named two-level hashing that enables fast longest match searching from a sliding dictionary, and the other uses suffix sorting. The former is suitable for small dictionaries and it significantly improves the speed of gzip, which uses a naive hashing algorithm. The latter is suitable for large dictionaries which improve compression ratio for large files. We also experiment on…

No Paper Link Available

## 12 Citations

### A Complete Suffix Array-Based String Match Search Algorithm of Sliding Windows

- Computer Science2012 Fifth International Symposium on Computational Intelligence and Design
- 2012

The concept of completed suffix array and a new sliding window search algorithm for string matching using the orderly construction of suffix arrays are proposed to make it unnecessary that suffix array is rebuilt every time.

### Parallel Decompression of Gzip-Compressed Files and Random Access to DNA Sequences

- Computer Science2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
- 2019

A parallel algorithm and an implementation that performs fast and exact decompression of any text file, pugz, is proposed and shown to be an order of magnitude faster than gunzip, and 5x faster than a highly-optimized sequential implementation (libdeflate).

### Long Distance Redundancy Reduction in Thin Client Computing

- Computer Science6th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2007)
- 2007

This paper presents a different way to extend the history buffer and a history extension scheme based on it, aiming at minimizing long distance redundancies, and empirically studied the effectiveness of this scheme on some screen updates generated by one of the most bandwidth-efficient thin client system, Microsoft Terminal Service.

### Secure Lempel-Ziv compression with embedded encryption

- Computer ScienceIS&T/SPIE Electronic Imaging
- 2005

An encryption scheme called the Randomized Dictionary Table (RDT), which embeds encryption into the LZ78 data compression method, is proposed and analyzed and achieves high security strength under both the ciphertext only attack and the known/chosen plaintext attack.

### Parallel I/O on Compressed Data Files: Semantics, Algorithms, and Performance Evaluation

- Computer Science2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID)
- 2020

The paper details handling of individual read and write operations of compressed data files, and presents an extension to the two-phase collective I/O algorithm to support data compression, and demonstrates significant performance benefits when using data compression on a parallel BeeGFS file system.

### Efficient compression of large repetitive strings

- Computer Science
- 2015

This research presents a meta-compression framework that automates the very labor-intensive and therefore time-heavy and expensive process of manually cataloging and extracting compressors from data.

### Analysis and reduction of data spikes in thin client computing

- Computer ScienceJ. Parallel Distributed Comput.
- 2008

### Research of an image map encoding algorithm on frame buffer

- Computer Science2007 7th International Conference on ASIC
- 2007

An original data encoding algorithm, image map encode algorithm, based on the traditional LZ77 algorithm is proposed, which can reduce the frequency of the access from LCD controller to frame buffer, thus decreasing the power of the whole system effectively.

### Rapid Development of Gzip with MaxJ

- Computer ScienceARC
- 2017

The gzip design in MaxJ presented here took only one man-month to develop and achieved better performance than the related work created in Verilog and OpenCL.

## References

SHOWING 1-10 OF 15 REFERENCES

### Extended application of suffix trees to data compression

- Computer ScienceProceedings of Data Compression Conference - DCC '96
- 1996

It is shown that the scheme can be applied to PPM-style compression, obtaining an algorithm that runs in linear time, and in space bounded by an arbitrarily chosen window size.

### Longest‐match string searching for ziv‐lempel compression

- Computer ScienceSoftw. Pract. Exp.
- 1993

Eight data structures that can be used to accelerate the searching, including adaptations of four methods normally used for exact matching searching, are presented, indicating the trade‐offs available between compression speed and memory consumption.

### Suffix arrays: a new method for on-line string searches

- Computer ScienceSODA '90
- 1990

A new and conceptually simple data structure, called a suffixarray, for on-line string searches is introduced in this paper, and it is believed that suffixarrays will prove to be better in practice than suffixtrees for many applications.

### The effect of non-greedy parsing in Ziv-Lempel compression methods

- Computer ScienceProceedings DCC '95 Data Compression Conference
- 1995

Practical implementations for using non-greedy parsing in LZ77 and LZ78 compression are explored and some experimental measurements are presented.

### A corpus for the evaluation of lossless compression algorithms

- Computer ScienceProceedings DCC '97. Data Compression Conference
- 1997

A principled technique for collecting a corpus of test data for compression methods is developed, and a corpus, called the Canterbury corpus, is developed using this technique, and the performance of a collection of compression methods using the new corpus is reported.

### A Block-sorting Lossless Data Compression Algorithm

- Computer Science
- 1994

A block-sorting, lossless data compression algorithm, and the implementation of that algorithm and the performance of the implementation with widely available data compressors running on the same hardware are compared.

### Modifications of the Burrows and Wheeler data compression algorithm

- Computer ScienceProceedings DCC'99 Data Compression Conference (Cat. No. PR00096)
- 1999

Based on the context tree model, the specific statistical properties of the data at the output of the BWT are considered, which lead to modifications of the coding method, which improve the coding efficiency.

### The sliding-window Lempel-Ziv algorithm is asymptotically optimal

- Computer ScienceProc. IEEE
- 1994

The sliding-window version of the Lempel-Ziv data-compression algorithm is described, and it is shown that as the "window size," a quantity related to the memory and complexity of the procedure, goes to infinity, the compression rate approaches the source entropy.

### A universal algorithm for sequential data compression

- Computer ScienceIEEE Trans. Inf. Theory
- 1977

The compression ratio achieved by the proposed universal code uniformly approaches the lower bounds on the compression ratios attainable by block-to-variable codes and variable- to-block codes designed to match a completely specified source.

### Fast hashing of variable-length text strings

- Computer ScienceCACM
- 1990

Using only a few simple and commonplace instructions, this algorithm efficiently maps variable-length text strings small integers.