Optimizing ETL processes in data warehouses

@article{Simitsis2005OptimizingEP,
  title={Optimizing ETL processes in data warehouses},
  author={Alkis Simitsis and Panos Vassiliadis and Timos K. Sellis},
  journal={21st International Conference on Data Engineering (ICDE'05)},
  year={2005},
  pages={564-575}
}
Extraction-transformation-loading (ETL) tools are pieces of software responsible for the extraction of data from several sources, their cleansing, customization and insertion into a data warehouse. [] Key Method We consider each ETL workflow as a state and fabricate the state space through a set of correct state transitions. Moreover, we provide algorithms towards the minimization of the execution cost of an ETL workflow.
Logical Optimization of ETL Workflows
TLDR
This paper delves into the logical optimization of ETL processes, modeling it as a state-space search problem and provides algorithms towards the minimization of the execution cost of an ETL workflow.
State-space optimization of ETL workflows
TLDR
This paper derives into the logical optimization of ETL processes, modeling it as a state-space search problem, and provides an exhaustive and two heuristic algorithms toward the minimization of the execution cost of an ETL workflow.
Query Optimizer for the ETL Process in Data Warehouses
TLDR
This paper delves into the optimization of queries by recommending indices which reduces cost of the queries and improves performance ofThe queries.
Optimization of ETL Work Flow in Data Warehouse
TLDR
This paper presents, to implement the one ETL scenario with the help of ARKTOS II, a stepwise process, to minimize the time required for completion of ETL workflow and resources needed for the ETL tasks.
Proposed architecture for ETL workflow generator*
TLDR
An architecture is presented to automatically integrate the system which generates mappings and transformations based on ontologies and the traditional ETL tools and the architecture of the prototyped system is message based which enables parallel processing.
Mapping conceptual to logical models for ETL processes
TLDR
This paper describes the mapping of the conceptual to the logical model, and identifies how a conceptual entity is mapped to a logical entity and determines the execution order in the logical workflow using information adapted from the conceptual model.
SYSTEMATIC ETL MANAGEMENT – EXPERIENCES WITH HIGH-LEVEL OPERATORS ( Practice-Oriented )
TLDR
This paper builds an ETL management framework to improve this difficult task by providing high-level operations, such as searching, matching, or merging ETL workflows, and presents the lessons learned throughout the implementation of a prototypical ETLmanagement framework.
Optimized incremental ETL jobs for maintaining data warehouses
TLDR
This paper presents a new transformation-based approach to automatically derive incremental ETL jobs based on a simplification of the underlying update propagation process based on the computation of so-called safe updates instead of true ones.
A taxonomy of ETL activities
TLDR
A black-box approach is followed and a taxonomy that characterizes ETL activities in terms of the relationship of their input to their output is provided and it is shown how the proposed taxonomy can be used in the construction of larger modules, i.e., ETL archetype patterns, which are used for the composition and optimization of ETL workflows.
AN OVERVIEW ON PHYSICAL IMPLEMENTATION OF SECURE ETL WORKFLOW
TLDR
This work considers each ETL workflow as a state and fabricate the state space through a set of correct state transition, which results in logical optimization of ETL processes.
...
...

References

SHOWING 1-10 OF 24 REFERENCES
Optimizing ETL processes in data warehouse environments
TLDR
This paper delves into the logical optimization of ETL processes, modeling it as a state-space search problem and provides algorithms towards the minimization of the execution cost of an ETL workflow.
A Framework for the Design of ETL Scenarios
TLDR
This paper describes a framework for the declarative specification of ETL scenarios with two main characteristics: genericity and customization and presents a palette of several templates, representing frequently used ETL activities along with their semantics and their interconnection.
Lineage tracing for general data warehouse transformations
TLDR
This work formally defines the lineage tracing problem in the presence of general data warehouse transformations, and presents algorithms for lineage tracing in this environment, and can be used as the basis for a lineage tracing tool in a general warehousing setting.
Efficient resumption of interrupted warehouse loads
TLDR
This work develops a resumption algorithm called DR that imposes no overhead and relies only on the high-level properties of the transformations of the data and shows that DR can lead to a ten-fold reduction in resumption time by performing experiments using commercial software.
Data Cleaning: Problems and Current Approaches
TLDR
This work classifies data quality problems that are addressed by data cleaning and provides an overview of the main solution approaches and discusses current tool support for data cleaning.
A Transactional Model for Data Warehouse Maintenance
TLDR
TxnWrap is complementary to maintenance algorithms from the literature by removing concurrency issues from their consideration and proposes a multiversion concurrency control technique appropriate for loosely-coupled environments with autonomous sources.
AJAX: an extensible data cleaning tool
TLDR
The AJAX system applied to two real world problems: the consolidation of a telecommunication database, and the conversion of a dirty database of bibliographic references into a set of clean, normalized, and redundancy free relational tables maintaining the same data are presented.
Potter's Wheel: An Interactive Data Cleaning System
TLDR
Potter’s Wheel is presented, an interactive data cleaning system that tightly integrates transformation and discrepancy detection, and users can gradually build a transformation as discrepancies are found, and clean the data without writing complex programs or enduring long delays.
Query Optimization in Database Systems
TLDR
These methods are presented in the framework of a general query evaluation procedure using the relational calculus representation of queries, and nonstandard query optimization issues such as higher level query evaluation, query optimization in distributed databases, and use of database machines are addressed.
Data Transformation Services
TLDR
BCP (bulk copy program) and DTS (Data Transformation Services) are two tools SQL Server provides for transferring large amounts of data into or out of the database.
...
...