Identifying Personal Genomes by Surname Inference

@article{Gymrek2013IdentifyingPG,
  title={Identifying Personal Genomes by Surname Inference},
  author={Melissa Gymrek and Amy L McGuire and David Golan and Eran Halperin and Yaniv Erlich},
  journal={Science},
  year={2013},
  volume={339},
  pages={321 - 324}
}
Anonymity Compromised The balance between maintaining individual privacy and sharing genomic information for research purposes has been a topic of considerable controversy. Gymrek et al. (p. 321; see the Policy Forum by Rodriguez et al.) demonstrate that the anonymity of participants (and their families) can be compromised by analyzing Y-chromosome sequences from public genetic genealogy Web sites that contain (sometimes distant) relatives with the same surname. Short tandem repeats (STRs) on… Expand
Identity inference of genomic data using long-range familial searches
TLDR
Testing models of relatedness, Erlich et al. show that many individuals of European ancestry in the United States—even those that have not undergone genetic testing—can be identified on the basis of available genetic information, indicating a need for procedures to help maintain genetic privacy for individuals. Expand
Found your DNA on the web: reconciling privacy and progress.
TLDR
The researchers used surname inferences from commercial genealogy databases and Internet searches to deduce the identity of nearly fifty research participants whose supposedly private data were stored in large, publicly available datasets. Expand
Challenges in Genomic Privacy : An Analysis of Surname Attacks in the Population of Britain 1 Sahel
In 2013, Gymrek et al. reported that personal genomes can be re-identified through surname inference using patrilineal information inherent in the Y chromosome. They highlighted that the attack isExpand
Reconciling Utility with Privacy in Genomics
TLDR
An obfuscation mechanism is proposed that enables the genomic data to be publicly available for research, while protecting the genomic privacy of the individuals in a family, and an extension of the optimization algorithm to cope with the non-linear constraints induced by the correlations between SNPs. Expand
Identification of Anonymous DNA Using Genealogical Triangulation
TLDR
This work presents a “genealogical triangulation” algorithm and shows that for over 50% of targets, their anonymous DNA can be identified (matched to the correct individual or same-sex sibling) when the genetic database includes just 1% of the population. Expand
Genomics: Finding Mr Anonymous
TLDR
Y-STR haplotypes, derived from personal whole-genome sequences, could be combined with associated demographic data to identify the individual participant in some cases and were found to be able to deduce their known surname with a ~12% success rate. Expand
A utility maximizing and privacy preserving approach for protecting kinship in genomic databases
TLDR
Results indicate that concurrent sharing of data pertaining to a parent and an offspring results in high risks of kinship privacy, whereas the sharing data from further relatives together is often safer, and it is shown arrival order of family members have a high impact on the level of privacy risks and on the utility of sharing data. Expand
Attacks on genetic privacy via uploads to genealogical databases
TLDR
Several methods by which an adversary who wants to learn the genotypes of people in the database can do so by uploading multiple datasets are described, including a proof-of-concept demonstration that the GEDmatch database in particular uses unphased genotypes to detect IBS and is vulnerable to genotypes being revealed by artificial datasets. Expand
Addressing the concerns of the lacks family: quantification of kin genomic privacy
TLDR
This work formalizes the problem and detail an efficient reconstruction attack based on graphical models and belief propagation, and introduces the quantification of health privacy, specifically the measure of how well the predisposition to a disease is concealed from an attacker. Expand
De-anonymizing Genomic Databases Using Phenotypic Traits
TLDR
This paper quantifies, based on various phenotypic traits, the extent of this threat in several scenarios by implementing de-anonymization attacks on a genomic database of OpenSNP users sequenced by 23andMe, and evaluates the adversary’s ability to predict individuals’ predisposition to Alzheimer's disease. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 102 REFERENCES
Inferential genotyping of Y chromosomes in Latter-Day Saints founders and comparison to Utah samples in the HapMap project.
  • J. Gitschier
  • Geography, Medicine
  • American journal of human genetics
  • 2009
TLDR
Whether any of the Utahns who contributed to the HapMap project (the "CEU" set) is related to either Joseph Smith and Brigham Young, on the basis of haplotype analysis of the Y chromosome is determined. Expand
Founders, Drift, and Infidelity: The Relationship between Y Chromosome Diversity and Patrilineal Surnames
TLDR
A comparative analysis of published data on Y diversity within Irish surnames demonstrates a relative lack of surname frequency dependence of coancestry, a difference probably mediated through distinct Irish and British demographic histories including even more marked genetic drift in Ireland. Expand
Genetic Signatures of Coancestry within Surnames
TLDR
It is shown that sharing a surname significantly elevates the probability of sharing a Y-chromosomal haplotype and that this probability increases as surname frequency decreases, and that a large surname-based forensic database might contribute to the intelligence-led investigation of up to approximately 70 rapes and murders per year in the UK. Expand
Y-chromosomes and the extent of patrilineal ancestry in Irish surnames
TLDR
Ireland has one of the oldest systems of patrilineal hereditary surnames in the world and there is a substantial role for the Y-chromosome and a molecular genealogical approach to complement and expand existing sources. Expand
From linkage maps to quantitative trait loci: the history and science of the Utah genetic reference project.
TLDR
The families recruited from Utah provided the most widely used samples in the Centre d'Etudes du Polymorphisme Humain set, were instrumental in generating human linkage maps, and often serve as the benchmark for establishing allele frequency when a new variant is identified. Expand
A new statistic and its power to infer membership and phenotype in a genome-wide association study using genotype frequencies
TLDR
Using a likelihood-based statistical framework, an improved statistic is developed that uses genotype frequencies and individual genotypes to infer whether a specific individual or any close relatives participated in the GWAS and, if so, what the participant's phenotype status is. Expand
lobSTR: A short tandem repeat profiler for personal genomes.
TLDR
The speed and reliability of lobSTR exceed the performance of current mainstream algorithms for STR profiling, and the algorithm was used to conduct a comprehensive survey of STR variations in a deeply sequenced personal genome. Expand
Surnames and the Y chromosome.
TLDR
A randomly ascertained sample of males with the surname "Sykes" was typed with four Y-chromosome microsatellites, which points to a single surname founder for extant Sykes males, even though written sources had predicted multiple origins. Expand
A map of human genome variation from population-scale sequencing
TLDR
The pilot phase of the 1000 Genomes Project is presented, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms, and the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants are described. Expand
Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays
TLDR
High-density single nucleotide polymorphism genotyping microarrays are used to demonstrate the ability to accurately and robustly determine whether individuals are in a complex genomic DNA mixture, and suggest future research efforts into assessing the viability of previously sub-optimal DNA sources due to sample contamination. Expand
...
1
2
3
4
5
...