Data Cleaning: Detecting, Diagnosing, and Editing Data Abnormalities

@article{vandenBroeck2005DataCD,
  title={Data Cleaning: Detecting, Diagnosing, and Editing Data Abnormalities},
  author={Jan van den Broeck and Solveig Argeseanu Cunningham and Roger Eeckels and Kobus Herbst},
  journal={PLoS Medicine},
  year={2005},
  volume={2}
}
In this policy forum the authors argue that data cleaning is an essential part of the research process, and should be incorporated into study design. 

Figures and Tables from this paper

Methods for Cleaning and Managing a Nurse-Led Registry
TLDR
The methods described provide a structured way for nurses and their collaborators to clean and manage registries and resulted in high-quality data, which was confirmed by missing data analysis.
Better Reporting, Better Research: Guidelines and Guidance in PLoS Medicine
TLDR
PPLoS Medicine announces a new section: Guidelines and Guidance, where guidelines and guidance for medical practice are presented for the first time.
Statistics Corner: Data Cleaning-I
  • Kamal Kishore
  • Medicine
    Journal of Postgraduate Medicine, Education and Research
  • 2019
TLDR
The investigator was in dilemma, whether to share the data with a statistician before or after cleaning, and found some answers regarding the role and responsibilities of the investigator in data cleaning.
Assumptions made when preparing drug exposure data for analysis have an impact on results: An unreported step in pharmacoepidemiology studies
TLDR
This study aimed to develop a framework to define and document drug data preparation and to examine the impact of different assumptions on results.
Data Cleaning and Data Visualization Systems for Learning Analytics
TLDR
The up-to-date findings and outcomes of the research, design, and development projects at the InterLabs Research Institute at Bradley University that are focused on the analysis and testing of effective systems to clean and visualize student academic performance data for learning analytics are presented.
Understanding Your Data
Too Much Information: Research Issues Associated With Large Databases
TLDR
Rec registries and administrative databases provide healthcare researchers with increasing opportunities to address a wide variety of important practice and patient care questions and are encouraged to explore large data sets to improve patient safety and quality care.
Targeting Non-obvious Errors in Death Certificates
TLDR
Mortality statistics are much used although their accuracy is often questioned, and current methods only capture obvious errors in death certification.
A systematic approach to initial data analysis is good research practice.
Making a distinction between data cleaning and central monitoring in clinical trials
TLDR
Early clinical trials collected data on punch cards and then on paper, but now, with increasing use of electronic data capture to replace paper forms, staff at trial sites are entering data directly into databases and are prompted in real time with automated data checks.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 35 REFERENCES
Analysis of Incomplete Multivariate Data
TLDR
The Normal Model Methods for Categorical Data Loglinear Models Methods for Mixed Data and Inference by Data Augmentation Methods for Normal Data provide insights into the construction of categorical and mixed data models.
Clinical Data Management
From the Publisher: The first comprehensive volume on the subject of clinical data management, this book contains concise, well-researched information covering all aspects of data management from
Post-randomisation exclusions: the intention to treat principle and excluding patients from analysis
TLDR
The authors consider the circumstances when it may be possible to exclude patients from the analysis of data in clinical trials, even in an intention to treat trial.
Missing data
  • John L.P. Thompson, G. Levy
  • Mathematics
    Amyotrophic lateral sclerosis and other motor neuron disorders : official publication of the World Federation of Neurology, Research Group on Motor Neuron Diseases
  • 2004
TLDR
The importance of missing data in RCTs is emphasized, and how the problem can be handled in an unbiased way by imputation procedures is discussed, and some recommendations for trial design and conduct are made that are tailored to R CTs for ALS.
Data Base Error Trapping and Prediction
TLDR
This work develops and analyzes models for a class of problems involving inferences about uncertain numbers of errors in data bases and generates inferences in terms of predictive distributions for the numbers of undetected errors.
Attrition in longitudinal studies. How to deal with missing data.
A product perspective on total data quality management
TLDR
The purpose of this TDQM methodology is to deliver highquality information products (IP) to information consumers and aims to facilitate the implementation of an organization’s overall data quality policy formally expressed by top management.
Editing data: what difference do consistency checks make?
TLDR
The authors examined five possible approaches to handling data inconsistencies and the effect that each has on point estimates of current cigarette use in a self-administered school-based survey of tobacco use, attitudes, and behaviors in Florida.
Practical statistics for medical research
TLDR
Practical Statistics for Medical Research is a problem-based text for medical researchers, medical students, and others in the medical arena who need to use statistics but have no specialized mathematics background.
...
1
2
3
4
...