AI-based Re-identification of Behavioral Clickstream Data
@article{Vamosi2022AIbasedRO, title={AI-based Re-identification of Behavioral Clickstream Data}, author={Stefan Vamosi and Michaela D. Platzer and Thomas Reutterer}, journal={ArXiv}, year={2022}, volume={abs/2201.10351} }
AI-based face recognition, i.e., the re-identification of individuals within images, is an already well established technology for video surveillance, for user authentication, for tagging photos of friends, etc. This paper demonstrates that similar techniques can be applied to successfully re-identify individuals purely based on their behavioral patterns. In contrast to de-anonymization attacks based on record linkage, these methods do not require any overlap in data points between a released…Â
References
SHOWING 1-10 OF 17 REFERENCES
How To Break Anonymity of the Netflix Prize Dataset
- Computer Science, EconomicsArXiv
- 2006
This work presents a new class of statistical de-anonymization attacks against high-dimensional micro-data, such as individual preferences, recommendations, transaction records and so on, and demonstrates that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber's record in the dataset.
Unique in the shopping mall: On the reidentifiability of credit card metadata
- EconomicsScience
- 2015
It is shown that four spatiotemporal points are enough to uniquely reidentify 90% of individuals and that knowing the price of a transaction increases the risk of reidentification by 22%, on average.
FaceNet: A unified embedding for face recognition and clustering
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
A system that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure offace similarity, and achieves state-of-the-art face recognition performance using only 128-bytes perface.
Holdout-Based Empirical Assessment of Mixed-Type Synthetic Data
- Computer ScienceFrontiers in Big Data
- 2021
A novel holdout-based empirical assessment framework for quantifying the fidelity as well as the privacy risk of synthetic data solutions for mixed-type tabular data is introduced and strong evidence that the synthesizer indeed learned to generalize patterns and is independent of individual training records is yielded.
They Who Must Not Be Identified - Distinguishing Personal from Non-Personal Data Under the GDPR
- Computer ScienceSSRN Electronic Journal
- 2019
It is concluded that there always remains a residual risk when anonymisation is used and the concluding section links this conclusion more generally to the notion of risk in the GDPR.
Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
- Computer ScienceInternet Measurement Conference
- 2020
This work explores if and how generative adversarial networks can be used to incentivize data sharing by enabling a generic framework for sharing synthetic datasets with minimal expert knowledge and designs a custom workflow called DoppelGANger, which achieves up to 43% better fidelity than baseline models.
They who must not be identified—distinguishing personal from non-personal data under the GDPR
- Computer Science
- 2020
It is concluded that there always remains a residual risk when anonymisation is used and the concluding section links this conclusion more generally to the notion of risk in the GDPR.
Protecting customer privacy when marketing with second-party data
- Business, Computer Science
- 2017
Unique in the Crowd: The privacy bounds of human mobility
- Computer ScienceScientific reports
- 2013
It is found that in a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier's antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals.
A Flexible Method for Protecting Marketing Data: An Application to Point-of-Sale Data
- Computer ScienceMark. Sci.
- 2018
A Bayesian probability model is proposed that produces protected synthetic data in the context of a business ecosystem in which data providers seek to meet the information needs of data users, but wish to deter invalid use of the data by potential intruders.