A Survey of Extract-Transform-Load Technology

@article{Vassiliadis2009ASO,
  title={A Survey of Extract-Transform-Load Technology},
  author={Panos Vassiliadis},
  journal={Int. J. Data Warehous. Min.},
  year={2009},
  volume={5},
  pages={1-27}
}
The software processes that facilitate the original loading and the periodic refreshment of the data warehouse contents are commonly known as Extraction-Transformation-Loading (ETL) processes. [] Key Method To this end, we organize the coverage of the field as follows: (a) first, we cover the conceptual and logical modeling of ETL processes, along with some design methods, (b) we visit each stage of the E-T-L triplet, and examine problems that fall within each of these stages, (c) we discuss problems that…
Efficient incremental loading in ETL processing for real-time data integration
TLDR
This paper focuses on alternative ETL developmental approach taken by hand coding, and presents a comparative evaluation of some well-known code-based open-source ETL tools developed by the academic world.
A Unified Model Driven Methodology for Data Warehouses and ETL Design
TLDR
A generic unified and semi-automated method that integrates DW and ETL processes design and the transformation rules are formalized using the Query/View/Transformation (QVT) language.
A New Approach for Conceptual Extraction-Transformation-Loading Process Modeling
TLDR
A MBSE based approach to automate the SysML model's validation by using No Magic simulator is presented and the main objective is to overcome the gap between modeling and simulation and to examine the performance of the proposed Sys ML model.
Research on the Stream ETL Process
TLDR
First implementation of the stream ETL process is presented, which origins from model and concept of a Stream Data Warehouse, and results of performed accuracy and efficiency analysis are presented.
GENUS: An ETL tool treating the Big Data Variety
  • Salwa Souissi, Mounir Ben Ayed
  • Computer Science
    2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA)
  • 2016
TLDR
A new ETL tool is introduced, GENUS, which extracts its data from different document types: text, image, and video, transform them, and load them to a document data warehouse, and is implemented and validated in a commercial case study.
ETL Processes in the Era of Variety
TLDR
This paper makes generic different types of data sources and shows the impact of genericity of operators in the ETL workflow, where a Web-service-driven approach for orchestrating the ETS flows is given and the extracted and merged data obtained by theETL workflow are deployed according their favorite stores.
A Comparative Review of Data Warehousing ETL Tools with New Trends and Industry Insight
TLDR
This paper has compared different aspects of some popular ETL tools (Informatica, Datastage, Ab Initio, Oracle Data Integrator, SSIS) and have analysed their advantages and disadvantages and highlighted some salient features.
SimpleETL: ETL Processing by Simple Specifications
TLDR
The general framework SimpleETL is presented which is currently used for Extract-Transform-Load (ETL) processing in a company with such requirements and enables, e.g., data scientists, to program complete and complex ETL solutions very efficiently with only few lines of code.
Using OCL for Automatically Producing Multidimensional Models and ETL Processes
TLDR
This paper presents a unified conceptual model that describes both the DW and its ETL process using the constellation model and the Object Constraint Language (OCL) and describes the implemented prototype architecture.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 62 REFERENCES
Conceptual modeling for ETL processes
TLDR
The proposed conceptual model is customized for the tracing of inter-attribute relationships and the respective ETL activities in the early stages of a data warehouse project and constructed in a customizable and extensible manner, so that the designer can enrich it with his own re-occurring patterns forETL activities.
Modeling ETL activities as graphs
TLDR
This paper focuses on the logical design of the ETL scenario of a data warehouse, which is based on a formal logical model that includes the data stores, activities and their constituent parts as a graph, which it is called the Architecture Graph.
Optimizing ETL processes in data warehouses
TLDR
This paper delves into the logical optimization of ETL processes, modeling it as a state-space search problem and provides algorithms towards the minimization of the execution cost of an ETL workflow.
A UML Based Approach for Modeling ETL Processes in Data Warehouses
TLDR
This paper provides the necessary mechanisms for an easy and quick specification of the common operations defined in these ETL processes such as, the integration of different data sources, the transformation between source and target attributes, the generation of surrogate keys and so on.
A Framework for the Design of ETL Scenarios
TLDR
This paper describes a framework for the declarative specification of ETL scenarios with two main characteristics: genericity and customization and presents a palette of several templates, representing frequently used ETL activities along with their semantics and their interconnection.
Designing ETL processes using semantic web technologies
TLDR
It is argued that ontologies constitute a very suitable model for this purpose and how the usage of ontologies can enable a high degree of automation regarding the construction of an ETL design is shown.
State-space optimization of ETL workflows
TLDR
This paper derives into the logical optimization of ETL processes, modeling it as a state-space search problem, and provides an exhaustive and two heuristic algorithms toward the minimization of the execution cost of an ETL workflow.
Towards a Benchmark for ETL Workflows
TLDR
This paper investigates the main characteristics and peculiarities of ETL processes and proposes a principled organization of test suites for the problem of experimenting with ETL scenarios.
Data Mapping Diagrams for Data Warehouse Design with UML
TLDR
This paper presents a disciplined framework for the modeling of the relationships between sources and targets in different levels of granularity and extends UML (Unified Modeling Language) to model attributes as first-class citizens.
...
1
2
3
4
5
...