• Corpus ID: 61179601

A Descriptive Classification of Causes of Data Quality Problems in Data Warehousing

  title={A Descriptive Classification of Causes of Data Quality Problems in Data Warehousing},
  author={Ranjit Singh and Kawaljeet Singh},
Data warehousing is gaining in eminence as organizations become awake of the benefits of decision oriented and business intelligence oriented data bases. [] Key Result We hope this will help developers & Implementers of warehouse to examine and analyze these issues before moving ahead for data integration and data warehouse solutions for quality decision oriented and business intelligence oriented applications.

Figures and Tables from this paper

The state-of-the-art purpose of the paper is to identify the reasons for data deficiencies, non-availability or reach ability problems at the ETL stage of data warehousing and to formulate descriptive classification of these causes.
A Comparison Study of Data Scrubbing Algorithms and Frameworks in Data Warehousing
This paper presents comparison and analysis for DS algorithms and the pros and cons of each algorithm, accuracy and time complexity, and a comparative and analysis of the Data Scrubbing Frameworks and determine the best framework.
Managing Data Source quality for data warehouse in manufacturing services
  • N. Idris, Kamsuriah Ahmad
  • Business, Computer Science
    Proceedings of the 2011 International Conference on Electrical Engineering and Informatics
  • 2011
A high quality management system in managing data source is proposed based on ISO 9001:2008 standard and hopes it can help organizations in implementing and operating quality managementSystem.
Data Warehouses and Big Data: How to Cope With Data Quality
A survey about the exiting techniques to control the quality of the stored data in the DW systems and the new solutions proposed in the literature to face the new Big Data requirements is provided.
Taxonomy of data quality problems in multidimensional Data Warehouse models
This article presents a taxonomy for data quality issues, in the context of multidimensional data models, characteristic of DW, and proposes its use in future works for quality measurements.
Measuring Data Quality in a Data Warehouse Environment
An investigative case study of the errors in a data warehouse was conducted at the Swedish company Kaplan, and resulted in guiding principles on how to improve the data quality.
The existing DW system of the Bank is used as a typical DW and its data quality is analysed using proposed methodology and suggestions have also been provided to enhance DQ of typical DW.
Data Quality Problems in TPC-DI Based Data Integration Processes
In order to prevent data quality problems and proactively manage data quality, a set of practical guidelines for researchers and practitioners to conduct data quality management when using the TPC-DI benchmark is proposed.
Systems Dynamics-Based Modeling of Data Warehouse Quality
Key findings include data quality and data model quality that are more important than DBMS quality for ensuring data warehouse quality, and the number of data entry errors and the level of data complexity can be major detriments to DW quality.
Data Quality in Data warehouse: problems and solution
The purpose of the paper is to identify the reasons for data deficiencies, non-availability or reach ability problems at all the aforementioned stages of data warehousing and to give some classification of these causes as well as solution for improving data quality through Statistical Process Control (SPC),Quality engineering management .


Warehouse Creation - A Potential Roadblock to Data Warehousing
The principal goal of this paper is to identify the common issues in data integration and data-warehouse creation, which have been studied for about two decades.
A Taxonomy of Dirty Data
A comprehensive classification of dirty data is developed for use as a framework for understanding how dirty data arise, manifest themselves, and may be cleansed to ensure proper construction of data warehouses and accurate data analysis.
Key issues in achieving data quality and consistency in data warehousing among large organisations in Australia
  • A. Rudra, Emilie Yeo
  • Business
    Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers
  • 1999
It is found that the quality of data in a data warehouse could be influenced by factors like: data not fully captured, heterogeneous system integration and lack of policy and planning from management.
A classification of semantic conflicts in heterogeneous database systems
A classification of semantic conflicts which can be used as the basis for the incremental discovery and resolution of these conflicts and provides a systematic representation of alternative semantic interpretations of conflicts during the reconciliation process.
Legacy Information Systems: Issues and Directions
An overview of existing research is offered and two promising methodologies for legacy information system migration are presented, which can support organizations into the future.
Summary The ISSN system, which was established in the 1970s for the identification of printed serial publications, is also a powerful system for the identification of electronic resources, thanks to
Challenges with legacy data: Knowing your data enemy is the first step in overcoming it
  • Practice Leader, Agile Development, Rational Methods Group, IBM,
  • 2001
Dealing with Missing Values In The Data Warehouse
  • 1999
Data Cleaning: Problems and Current Approaches
This work classifies data quality problems that are addressed by data cleaning and provides an overview of the main solution approaches and discusses current tool support for data cleaning.