Warehousing complex data from the web

  title={Warehousing complex data from the web},
  author={Omar Boussa{\"i}d and J{\'e}r{\^o}me Darmont and Fadila Bentayeb and Sabine Loudcher},
  journal={Int. J. Web Eng. Technol.},
Data warehousing and Online Analytical Processing (OLAP) technologies are now moving onto handling complex data that mostly originate from the web. However, integrating such data into a decision-support process requires their representation in a form processable by OLAP and/or data mining techniques. We present in this paper a complex data warehousing methodology that exploits eXtensible Markup Language (XML) as a pivot language. Our approach includes the integration of complex data in an ODS… 

Figures from this paper

X-WACoDa: An XML-based approach for Warehousing and Analyzing Complex Data

This paper proposes a unified XML warehouse reference model that synthesizes and enhances related work, and fits into a global XML warehousing and analysis approach the authors have developed, and presents a software platform based on this model.

An overview of XML warehouse design approaches and techniques

An overview of current state of the art concerning how XML technologies can be combined with data warehousing systems is given by the presentation of the most relevant XML warehousing approaches proposed in the literature.

Construction and Maintenance of Heterogeneous Data Warehouses

This work describes the construction of a data warehouse by the integration of heterogeneous relational and object-relational data based on the extraction of the inter-schema relationships between the sources.

Real-time data warehousing for business intelligence

This study compares the two DMMs of real-time ETL and data warehouse multidimensional modeling on the basis of four characteristics: heterogeneous data integration, types of measures supported, aggregate query processing, and incremental maintenance.

Topological XML data cube construction

This paper analysed the distinct characteristics and requirements of a more structured OLAP to make comprehensive comparisons between structural and flat dimensions, and examined different XML cube construction models for commonly used XML recursive structures to build a conceptual model.

A global and comprehensive approach for XML data warehouse design

This paper proposes a semi-automatic approach for XML data warehouse design starting from XML schemas as data sources, and proposes a multi-dimensional (MD) element extraction algorithm to automatically identify facts, measures and their corresponding dimensions.

Using a Pipeline Approach to Build Data Cube for Large XML Data Streams

A pipeline design based OLAP data cube construction framework designated for real time web generated sensor data, transforming sensor data into XML streams conforming to an underlying data warehouse logical model, which constructs corresponding data cubes are presented.

Active XML-based Web data integration

This paper proposes a generic, metadata-based, service-oriented, and event-driven approach for integrating Web data timely and autonomously, and designs and develops a framework that utilizes Web standards for tackling data heterogeneity, distribution and interoperability issues.

Active XML-based Web data integration

This paper proposes a generic, metadata-based, service-oriented, and event-driven approach for integrating Web data timely and autonomously, and designs and develops a framework that utilizes Web standards for tackling data heterogeneity, distribution and interoperability issues.

A Unified Approach to Multisource Data Analyses

A conceptual modeling solution, named Unified Cube, which blends together multidimensional data from DWs and LOD datasets without materializing them in a stationary repository and an analysis processing process which queries different sources in a transparent way to decision-makers is proposed.



X-Warehousing: An XML-Based Approach for Warehousing Complex Data

An XML-based methodology, named X-Warehousing, which designs warehouses at a logical level, and populates them with XML documents at a physical level, which represents the logical model of a data warehouse and populate the physical model of the data warehouse, called the XML cube.

A Data Mining-Based OLAP Aggregation of Complex Data: Application on XML Documents

This article provides a generalized OLAP operator, called OpAC, based on the AHC, adapted for all types of data, since it deals with data cubes modeled within XML, and develops a Web application for the operator.

Integration and dimensional modeling approaches for complex data warehousing

This paper defines a generic UML model that helps representing a wide range of complex data, including their possible semantic properties, and proposes an approach that exploits data mining techniques to assist users in building relevant dimensional models.

Web multiform data structuring for warehousing

This chapter proposes a modeling process for integrating all these diverse, heterogeneous data into a unified format and views this database as an ODS, whose data will have to be re-modeled in a multidimensional way to allow their storage in a warehouse and, later, their analysis.

Data warehouse design from XML sources

This paper shows how the design of a data mart can be carried out starting directly from an XML source, and proposes a semi-automatic approach for building the conceptual schema for a dataMart starting from the XML sources.

An Architecture Framework for Complex Data Warehouses

This paper proposes a precise, though open, definition of complex data, and presents a general architecture framework for warehousing complex data that heavily relies on metadata and domain-related knowledge, and rests on the XML language, which helps storing data, metadata anddomain-specific knowledge altogether, and facilitates communication between the various warehousing processes.

Materialized View Selection by Query Clustering in XML Data Warehouses

This paper proposes an automatic strategy for the selection of XML materialized views that exploits a data mining technique, more precisely the clustering of the query workload, and demonstrates its efficiency, even when queries are complex.

Designing Web Warehouses from XML Schemas

A semi-automated methodology for designing web warehouses from XML sources modeled by XML Schemas, with particular relevance to the problem of detecting shared hierarchies and convergence of dependencies, and of modeling many-to-many relationships.

X-warehouse: building query pattern-driven data

This paper proposes an approach to materialize XML data warehouses based on the frequent query patterns discovered from historical queries issued by users represented as Frequent Query Pattern Trees (FreqQPTs).

Building the data warehouse

This Second Edition of Building the Data Warehouse is revised and expanded to include new techniques and applications of data warehouse technology and update existing topics to reflect the latest thinking.