Big Data is not the New Oil: Common Misconceptions about Population Data
@inproceedings{Christen2021BigDI, title={Big Data is not the New Oil: Common Misconceptions about Population Data}, author={Peter Christen and Rainer Schnell}, year={2021} }
. Databases covering all individuals of a population are increasingly used for research and decision-making. The massive size of such databases is often mistaken as a guarantee for valid inferences. However, population data have characteristics that make them challenging to use. Various assumptions on population coverage and data quality are commonly made, including how such data were captured and what types of processing have been applied to them. Furthermore, the full potential of population…
4 Citations
Privacy-preserving record linkage using autoencoders
- Computer ScienceInternational Journal of Data Science and Analytics
- 2022
A novel encoding technique for PPRL based on autoencoders that transforms BFs into vectors of real numbers that guarantees the comparability of encodings generated by the different data owners is proposed.
Servitization for the Environment? The Impact of Data-Centric Product-Service Models
- BusinessJournal of Management Information Systems
- 2022
ABSTRACT Recent developments in data-centric technologies (e.g., big data, Internet of Things, cloud computing) have given rise to the data-centric models, such as servitization. Servitization here…
The Challenges of Algorithm-Based HR Decision-Making for Personal Integrity
- Computer ScienceJournal of business ethics : JBE
- 2019
It is suggested that critical data literacy, ethical awareness, the use of participatory design methods, and private regulatory regimes within civil society can help overcome challenges from the efficiency-driven logic of algorithm-based HR decision-making.
References
SHOWING 1-10 OF 59 REFERENCES
A Position Statement on Population Data Science: The Science of Data about People
- Computer ScienceInternational journal of population data science
- 2018
These implications are the beginnings of a research agenda for Population Data Science, which if approached as a collective field can catalyze significant advances in the understanding of trends in society, health, and human behavior.
‘For good measure’: data gaps in a big data world
- Political SciencePolicy Sciences
- 2020
Policy and data scientists have paid ample attention to the amount of data being collected and the challenge for policymakers to use and utilize it. However, far less attention has been paid towards…
Challenges in administrative data linkage for research
- Political ScienceBig data & society
- 2017
This article aims to increase understanding of the implications of (i) the data linkage environment and privacy preservation; (ii) the linkage process itself (including data preparation, and deterministic and probabilistic linkage methods) and (iii) linkage quality and potential bias in linked data.
A Taxonomy of Dirty Data
- Computer ScienceData Mining and Knowledge Discovery
- 2004
A comprehensive classification of dirty data is developed for use as a framework for understanding how dirty data arise, manifest themselves, and may be cleansed to ensure proper construction of data warehouses and accurate data analysis.
Automatic Discovery of Abnormal Values in Large Textual Databases
- Computer ScienceACM J. Data Inf. Qual.
- 2016
Three techniques to automatically discover abnormal (unexpected or unusual) values in large textual databases are developed, allowing an organization to conduct efficient data exploration, and improve the quality of their textual databases without the need of requiring explicit training data.
Generating Realistic Test Datasets for Duplicate Detection at Scale Using Historical Voter Data
- Computer ScienceEDBT
- 2021
This paper is the first who provides realistic test data for duplicate detection at this scale and relies on using historical data from the North Carolina voter registration, which is realistic as it contains actual voter data and facilitates generating realistic duplicates.
The role of administrative data in the big data revolution in social science research.
- Political ScienceSocial science research
- 2016
Evaluating privacy-preserving record linkage using cryptographic long-term keys and multibit trees on large medical datasets
- Computer ScienceBMC Medical Informatics and Decision Making
- 2017
It is argued that increased privacy of PPRL comes with the price of small losses in precision and recall and a large increase in computational burden and setup time.
Statistical challenges of administrative and transaction data
- Economics
- 2018
Administrative data are becoming increasingly important. They are typically the side effect of some operational exercise and are often seen as having significant advantages over alternative sources…
Economics in the age of big data
- EconomicsScience
- 2014
The percentage of papers published in the American Economic Review (AER) that obtained an exemption from the AER’s data availability policy is shown, as a share of all papers published by the A ER that relied on any form of data (excluding simulations and laboratory experiments).