Major flaws in “Identification of individuals by trait prediction using whole-genome sequencing data”

@article{Erlich2017MajorFI,
  title={Major flaws in “Identification of individuals by trait prediction using whole-genome sequencing data”},
  author={Yaniv Erlich},
  journal={bioRxiv},
  year={2017}
}
Genetic privacy is an area of active research. While it is important to identify new risks, it is equally crucial to supply policymakers with accurate information based on scientific evidence. Recently, Lippert et al. (PNAS, 2017) investigated the status of genetic privacy using trait-predictions from whole genome sequencing. The authors sequenced a cohort of about 1000 individuals and collected a range of demographic, visible, and digital traits such as age, sex, height, face morphology, and a… 

No major flaws in “Identification of individuals by trait prediction using whole-genome sequencing data”

It is shown that not only faces may be derived from DNA, but a wide range of phenotypes and demographic variables, and the main contribution of Lippert et al. is an algorithm that identifies genomes of individuals by combining multiple DNA-based predictive models for a myriad of traits.

Identification of individuals by trait prediction using whole-genome sequencing data

A maximum entropy algorithm is developed that integrates multiple predictions to determine which genomic samples and phenotype measurements originate from the same person and may have far-reaching ethical and legal implications.

Idéfix: identifying accidental sample mix-ups in biobanks using polygenic scores

Idéfix, a method for the identification of accidental sample mix-ups in biobanks using polygenic scores, is described and can already be used to identify a high-quality set of participants for whom it is very unlikely that they reflect sample Mix-ups, and therefore could be offered a pharmacogenetic passport.

Ensuring privacy and security of genomic data and functionalities

The genome privacy problem is discussed and relevant privacy attacks are reviewed, classified into identity tracing, attribute disclosure and completion attacks, which have been used to breach the privacy of an individual.

Artificial Intelligence and the Weaponization of Genetic Data

The ways in which data science is improving genetics are outlined and how that can ultimately lead to its weaponization are outlined, as well as to the broader social welfare risk associated with bio-warfare.

Machine learning and genomics: precision medicine versus patient privacy

  • Chloé-Agathe Azencott
  • Computer Science
    Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
  • 2018
How breaches in patient privacy can occur are reviewed, recent developments in computational data protection are presented and how they can be combined with legal and ethical perspectives to provide secure frameworks for genomic data sharing are discussed.

DNA Based Methods in Intelligence - Moving Towards Metagenomics

Existing DNA intelligence tools applied to forensic science, the application of microbial forensics and metagenomics along with the challenges and concerns that future developments entail are discussed.

Toward a Risk-Utility Data Governance Framework for Research Using Genomic and Phenotypic Data in Safe Havens: Multifaceted Review

A proportionate data governance framework is proposed to promote the safe, socially acceptable use of genomic and phenotypic data in safe havens to safeguard privacy and retain data utility for research.

Toward a Risk-Utility Data Governance Framework for Research Using Genomic and Phenotypic Data in Safe Havens: Multifaceted Review (Preprint)

Recommendations toward a risk-utility model with a flexible suite of controls to safeguard privacy and retain data utility for research in safe havens can be used to contribute toward a proportionate data governance framework to promote the safe, socially acceptable use of genomic and phenotypic data in safe haven.

Facial recognition from DNA using face-to-DNA classifiers

Another proof of concept to biometric authentication is established by using multiple face-to-DNA classifiers, each classifying given faces by a DNA-encoded aspect (sex, genomic background, individual genetic loci), or by aDNA-inferred aspect (BMI, age).

No major flaws in “Identification of individuals by trait prediction using whole-genome sequencing data”

It is shown that not only faces may be derived from DNA, but a wide range of phenotypes and demographic variables, and the main contribution of Lippert et al. is an algorithm that identifies genomes of individuals by combining multiple DNA-based predictive models for a myriad of traits.

Identification of individuals by trait prediction using whole-genome sequencing data

A maximum entropy algorithm is developed that integrates multiple predictions to determine which genomic samples and phenotype measurements originate from the same person and may have far-reaching ethical and legal implications.

Identifying Personal Genomes by Surname Inference

It is reported that surnames can be recovered from personal genomes by profiling short tandem repeats on the Y chromosome (Y-STRs) and querying recreational genetic genealogy databases and it is shown that a combination of a surname with other types of metadata, such as age and state, can be used to triangulate the identity of the target.

Bayesian method to predict individual SNP genotypes from gene expression data

A Bayesian approach to predict SNP genotypes that is based only on RNA expression data is developed and it is shown that predicted genotypes can accurately and uniquely identify individuals in large populations.

Defining the role of common variation in the genomic and biological architecture of adult human height

The results indicate a genetic architecture for human height that is characterized by a very large but finite number of causal variants, including mTOR, osteoglycin and binding of hyaluronic acid.

Routes for breaching and protecting genetic privacy

An overview of genetic privacy breaching strategies is presented, outlining the principles of each technique, the underlying assumptions, and their technological complexity and maturation, as well as highlighting different cases that are relevant to genetic applications.