XML Tree Structure Compression

@article{Maneth2008XMLTS,
  title={XML Tree Structure Compression},
  author={Sebastian Maneth and Nikolay L Mihaylov and Sherif Sakr},
  journal={2008 19th International Workshop on Database and Expert Systems Applications},
  year={2008},
  pages={243-247}
}
  • S. Maneth, N. Mihaylov, S. Sakr
  • Published 1 September 2008
  • Computer Science
  • 2008 19th International Workshop on Database and Expert Systems Applications
In an XML document a considerable fraction consists of markup, that is, begin and end-element tags describing the document's tree structure. XML compression tools such as XMill separate the tree structure from the data content and compress each separately. The main focus in these compression tools is how to group similar data content together prior to performing standard data compression such as gzip, bzip2, or ppm. In contrast, the focus of this paper is on compressing the tree structure part… 

Figures and Tables from this paper

Optimizing XML Compression
TLDR
It is shown that finding an optimal compression configuration with respect to compression gain is an NP-hard optimization problem, and an approximation algorithm for selecting a partitioning strategy for document content based on the branch-and-bound paradigm is described.
XML tree structure compression using RePair
Fast and Tiny Structural Self-Indexes for XML
TLDR
A fully-fledged index over grammar-compressed trees, used before as synopsis for structural XPath queries, is presented and allows to execute arbitrary tree algorithms with a slow-down that is comparable to the space improvement.
RFXFreeze: A non-queriable compressor for RFX storage structure
TLDR
This paper proposes a non-queriable compressor for the existing RFX structure which attains a high compression ratio at the cost of time for efficient retrieval of data.
Tree Structure Compression with RePair
TLDR
The new algorithm (TreeRePair) produces straight-line linear context-free tree (SLT) grammars which are smaller than those produced by previous grammar-based compressors such as BPLEX and give compression ratios comparable to the best known XML file compressors.
Retrieving information from compressed XML documents according to vague queries
TLDR
The aim of this thesis is to present the design of a system named “XML Compressing and Vague Querying (XCVQ)” which has the ability of compressing the XML document and retrieving the required information from the compressed version with less decompression required according to vague queries.
Ranking Tagged Resources Using Social Semantic Relevance
TLDR
This study designs a system for compressing and querying XML documents XMLCQ which compresses the XML document without the need to its schema or DTD to minimize the amount of technologies associated with these documents.
A framework of summarizing XML documents with schemas
TLDR
A framework of summarizing an XML document based both on the document itself and the schema is given, which applies schema to summarize XML documents because there are many important semantic and structural information implied by the schema.
Structural XML Query Processing
TLDR
To the best of the knowledge, this is the first work that provides a detailed description of XML query processing techniques that are related to structural aspects and that contains information about their theoretical and practical features as well as about their mutual compatibility and general usability.
Unification on Compressed Terms
TLDR
It is proved that the first-order unification of compressed terms is decidable in polynomial time, and also that a compressed representation of the most general unifier can be computed in poynomial time.
...
...

References

SHOWING 1-10 OF 16 REFERENCES
Efficient memory representation of XML document trees
AXECHOP: a grammar-based compressor for XML
TLDR
A compression scheme tailored specifically to XML named AXECHOP is presented, which generates a context-free grammar capable of deriving this original structure of the document and is passed through an adaptive arithmetic coder before being written to the compressed file.
Path Queries on Compressed XML
Compressing XML with multiplexed hierarchical PPM models
  • J. Cheney
  • Computer Science
    Proceedings DCC 2001. Data Compression Conference
  • 2001
TLDR
A working Extensible Markup Language (XML) compression benchmark is established, and it is found that bzip2 compresses XML best, albeit more slowly than gzip, and an online binary encoding for XML called Encoded SAX (ESAX) that compresses better and faster than existing methods is described.
TREECHOP: A Tree-based Query-able Compressor for XML
TLDR
This paper presents a novel technique for lossless XML compression, called TREECHOP, which supports querying of compressed XML data without requiring full decompression, and requires only a single pass over the input document during the compression process.
Supporting efficient query processing on compressed XML files
TLDR
By organizing the compression result as a set of context free grammar rules, the scheme supports efficient processing of XPath queries without decompression and achieves comparable compression ratio as gzip while its query processing time is among the best of existing algorithms.
XGrind: a query-friendly XML compressor
TLDR
Performance evaluations over a variety of XML documents and user queries indicate that XGrind simultaneously delivers improved query processing times and reasonable compression ratios.
Engineering succinct DOM
TLDR
The engineering of Succinct DOM is described, a DOM implementation, written in C++, which is suitable for in-memory representation of large static XML documents, and is based upon succinct data structures, which use an information-theoretically minimum amount of space to represent an object.
Tree Transducers and Tree Compressions
TLDR
A tree can be compressed into a DAG by sharing common subtrees, but a more powerful way of tree compression is to allow the sharing of tree patterns, i.e., internal parts of the tree.
XMill: an efficient compressor for XML data
We describe a tool for compressing XML data, with applications in data exchange and archiving, which usually achieves about twice the compression ratio of gzip at roughly the same speed. The
...
...