Paper 928 Complex Group-By Queries for XML

Abstract

The popularity of XML as a data exchange standard has led to the emergence of powerful XML query languages like XQuery [21] and studies on XML query optimization. Of late, there is considerable interest in analytical processing of XML data (e.g.,[2, 3]). As pointed out by Borkar and Carey in [3], even for data integration, there is a compelling need for performing various group-by style aggregate operations. A core operator needed for analytics is the groupby operator, which is widely used in relational as well as OLAP database applications. XQuery requires group-by operations to be simulated using nesting [2]. Studies addressing the need for XML grouping fall into two broad categories: (1) Provide support for grouping at the logical or physical level [6] and recognize grouping operations from nested queries and rewrite them with grouping operations [4, 5, 9, 12]. (2) Extend XQuery FLWOR expressions with explicit constructs similar to the group-by, order-by and having clauses in SQL [3, 2]. However, direct algorithmic support for a group-by operator is not explored. In this paper, we focus on efficient processing of a groupby operator for XML – with the additional goal of supporting a full spectrum of aggregation operations, including holistic ones such as median() [8] and complex nested aggregations, together with having clause, as well as moving window aggregation. Consider the simple catalogue example in Figure 1. This can be part of an input XML database, or intermediate result of a query. The catalogue is heterogeneous: it contains information about books, music CDs, etc. Books are organized by Subject, e.g., physics, chemistry. For each book, there is information on its Title, Author, Year, #Sold, Price, (publisher) Name, etc. Books may have multiple authors. The data value at a leaf node is shown in italics. The node id of a node is also shown for future discussion. Consider the following nested group-by query Q1. While we could follow the syntax proposed by [2], syntax not being our main focus, we use a more concise form. We also omit the selection part of the query, and just focus on the aggregation part.

Extracted Key Phrases

9 Figures and Tables

Cite this paper

@inproceedings{Gokhale2006Paper9C, title={Paper 928 Complex Group-By Queries for XML}, author={Chaitanya Gokhale and Noopur Gupta and Pawan Kumar and B. Aditya Prakash}, year={2006} }