An Online Algorithm for Lightweight Grammar-Based Compression

Abstract

Grammar-based compression is a well-studied technique for constructing a small context-free grammar (CFG) uniquely deriving a given text. In this paper, we present an online algorithm for lightweight grammar-based compression. Our algorithm is based on the LCA algorithm [Sakamoto et al. 2004]which guarantees nearly optimum compression ratio and space. LCA, however, is an offline algorithm and requires external space to save space consumption. Therefore, we present its online version which inherits most characteristics of the original LCA. Our algorithm guarantees $O(\log^2 n)$-approximation ratio for an optimum grammar size, and all work is carried out on a main memory space which is bounded by the output size. In addition, we propose more practical encoding based on parentheses representation of a binary tree. Experimental results for repetitive texts demonstrate that our algorithm achieves effective compression compared to other practical compressors and the space consumption of our algorithm is smaller than the input text size.

DOI: 10.1109/CCP.2011.40

Extracted Key Phrases

10 Figures and Tables

Cite this paper

@article{Maruyama2011AnOA, title={An Online Algorithm for Lightweight Grammar-Based Compression}, author={Shirou Maruyama and Masayuki Takeda and Masaya Nakahara and Hiroshi Sakamoto}, journal={2011 First International Conference on Data Compression, Communications and Processing}, year={2011}, pages={19-28} }