A Survey of Extract-Transform-Load Technology
@article{Vassiliadis2009ASO, title={A Survey of Extract-Transform-Load Technology}, author={Panos Vassiliadis}, journal={Int. J. Data Warehous. Min.}, year={2009}, volume={5}, pages={1-27} }
The software processes that facilitate the original loading and the periodic refreshment of the data warehouse contents are commonly known as Extraction-Transformation-Loading (ETL) processes. [] Key Method To this end, we organize the coverage of the field as follows: (a) first, we cover the conceptual and logical modeling of ETL processes, along with some design methods, (b) we visit each stage of the E-T-L triplet, and examine problems that fall within each of these stages, (c) we discuss problems that…
161 Citations
Efficient incremental loading in ETL processing for real-time data integration
- Computer ScienceInnovations in Systems and Software Engineering
- 2019
This paper focuses on alternative ETL developmental approach taken by hand coding, and presents a comparative evaluation of some well-known code-based open-source ETL tools developed by the academic world.
A Unified Model Driven Methodology for Data Warehouses and ETL Design
- Computer ScienceICEIS
- 2011
A generic unified and semi-automated method that integrates DW and ETL processes design and the transformation rules are formalized using the Query/View/Transformation (QVT) language.
A New Approach for Conceptual Extraction-Transformation-Loading Process Modeling
- Computer ScienceInt. J. Ambient Comput. Intell.
- 2019
A MBSE based approach to automate the SysML model's validation by using No Magic simulator is presented and the main objective is to overcome the gap between modeling and simulation and to examine the performance of the proposed Sys ML model.
Research on the Stream ETL Process
- Computer ScienceBDAS
- 2014
First implementation of the stream ETL process is presented, which origins from model and concept of a Stream Data Warehouse, and results of performed accuracy and efficiency analysis are presented.
GENUS: An ETL tool treating the Big Data Variety
- Computer Science2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA)
- 2016
A new ETL tool is introduced, GENUS, which extracts its data from different document types: text, image, and video, transform them, and load them to a document data warehouse, and is implemented and validated in a commercial case study.
ETL Processes in the Era of Variety
- Computer ScienceTrans. Large Scale Data Knowl. Centered Syst.
- 2018
This paper makes generic different types of data sources and shows the impact of genericity of operators in the ETL workflow, where a Web-service-driven approach for orchestrating the ETS flows is given and the extracted and merged data obtained by theETL workflow are deployed according their favorite stores.
A Comparative Review of Data Warehousing ETL Tools with New Trends and Industry Insight
- Computer Science2017 IEEE 7th International Advance Computing Conference (IACC)
- 2017
This paper has compared different aspects of some popular ETL tools (Informatica, Datastage, Ab Initio, Oracle Data Integrator, SSIS) and have analysed their advantages and disadvantages and highlighted some salient features.
SimpleETL: ETL Processing by Simple Specifications
- Computer ScienceDOLAP
- 2018
The general framework SimpleETL is presented which is currently used for Extract-Transform-Load (ETL) processing in a company with such requirements and enables, e.g., data scientists, to program complete and complex ETL solutions very efficiently with only few lines of code.
Using OCL for Automatically Producing Multidimensional Models and ETL Processes
- Computer ScienceDaWaK
- 2012
This paper presents a unified conceptual model that describes both the DW and its ETL process using the constellation model and the Object Constraint Language (OCL) and describes the implemented prototype architecture.
References
SHOWING 1-10 OF 62 REFERENCES
Conceptual modeling for ETL processes
- Computer ScienceDOLAP '02
- 2002
The proposed conceptual model is customized for the tracing of inter-attribute relationships and the respective ETL activities in the early stages of a data warehouse project and constructed in a customizable and extensible manner, so that the designer can enrich it with his own re-occurring patterns forETL activities.
Modeling ETL activities as graphs
- Computer ScienceDMDW
- 2002
This paper focuses on the logical design of the ETL scenario of a data warehouse, which is based on a formal logical model that includes the data stores, activities and their constituent parts as a graph, which it is called the Architecture Graph.
Optimizing ETL processes in data warehouses
- Computer Science21st International Conference on Data Engineering (ICDE'05)
- 2005
This paper delves into the logical optimization of ETL processes, modeling it as a state-space search problem and provides algorithms towards the minimization of the execution cost of an ETL workflow.
A method for the mapping of conceptual designs to logical blueprints for ETL processes
- Computer ScienceDecis. Support Syst.
- 2008
A UML Based Approach for Modeling ETL Processes in Data Warehouses
- Computer ScienceER
- 2003
This paper provides the necessary mechanisms for an easy and quick specification of the common operations defined in these ETL processes such as, the integration of different data sources, the transformation between source and target attributes, the generation of surrogate keys and so on.
A Framework for the Design of ETL Scenarios
- Computer ScienceCAiSE
- 2003
This paper describes a framework for the declarative specification of ETL scenarios with two main characteristics: genericity and customization and presents a palette of several templates, representing frequently used ETL activities along with their semantics and their interconnection.
Designing ETL processes using semantic web technologies
- Computer ScienceDOLAP '06
- 2006
It is argued that ontologies constitute a very suitable model for this purpose and how the usage of ontologies can enable a high degree of automation regarding the construction of an ETL design is shown.
State-space optimization of ETL workflows
- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2005
This paper derives into the logical optimization of ETL processes, modeling it as a state-space search problem, and provides an exhaustive and two heuristic algorithms toward the minimization of the execution cost of an ETL workflow.
Towards a Benchmark for ETL Workflows
- Computer ScienceQDB
- 2007
This paper investigates the main characteristics and peculiarities of ETL processes and proposes a principled organization of test suites for the problem of experimenting with ETL scenarios.
Data Mapping Diagrams for Data Warehouse Design with UML
- Computer ScienceER
- 2004
This paper presents a disciplined framework for the modeling of the relationships between sources and targets in different levels of granularity and extends UML (Unified Modeling Language) to model attributes as first-class citizens.