• Corpus ID: 246275724

AI-based Re-identification of Behavioral Clickstream Data

@article{Vamosi2022AIbasedRO,
  title={AI-based Re-identification of Behavioral Clickstream Data},
  author={Stefan Vamosi and Michaela D. Platzer and Thomas Reutterer},
  journal={ArXiv},
  year={2022},
  volume={abs/2201.10351}
}
AI-based face recognition, i.e., the re-identification of individuals within images, is an already well established technology for video surveillance, for user authentication, for tagging photos of friends, etc. This paper demonstrates that similar techniques can be applied to successfully re-identify individuals purely based on their behavioral patterns. In contrast to de-anonymization attacks based on record linkage, these methods do not require any overlap in data points between a released… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 17 REFERENCES
How To Break Anonymity of the Netflix Prize Dataset
TLDR
This work presents a new class of statistical de-anonymization attacks against high-dimensional micro-data, such as individual preferences, recommendations, transaction records and so on, and demonstrates that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber's record in the dataset.
Unique in the shopping mall: On the reidentifiability of credit card metadata
TLDR
It is shown that four spatiotemporal points are enough to uniquely reidentify 90% of individuals and that knowing the price of a transaction increases the risk of reidentification by 22%, on average.
FaceNet: A unified embedding for face recognition and clustering
TLDR
A system that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure offace similarity, and achieves state-of-the-art face recognition performance using only 128-bytes perface.
Holdout-Based Empirical Assessment of Mixed-Type Synthetic Data
TLDR
A novel holdout-based empirical assessment framework for quantifying the fidelity as well as the privacy risk of synthetic data solutions for mixed-type tabular data is introduced and strong evidence that the synthesizer indeed learned to generalize patterns and is independent of individual training records is yielded.
They Who Must Not Be Identified - Distinguishing Personal from Non-Personal Data Under the GDPR
TLDR
It is concluded that there always remains a residual risk when anonymisation is used and the concluding section links this conclusion more generally to the notion of risk in the GDPR.
Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
TLDR
This work explores if and how generative adversarial networks can be used to incentivize data sharing by enabling a generic framework for sharing synthetic datasets with minimal expert knowledge and designs a custom workflow called DoppelGANger, which achieves up to 43% better fidelity than baseline models.
They who must not be identified—distinguishing personal from non-personal data under the GDPR
TLDR
It is concluded that there always remains a residual risk when anonymisation is used and the concluding section links this conclusion more generally to the notion of risk in the GDPR.
Unique in the Crowd: The privacy bounds of human mobility
TLDR
It is found that in a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier's antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals.
A Flexible Method for Protecting Marketing Data: An Application to Point-of-Sale Data
TLDR
A Bayesian probability model is proposed that produces protected synthetic data in the context of a business ecosystem in which data providers seek to meet the information needs of data users, but wish to deter invalid use of the data by potential intruders.
...
...