Learn More
We are entering an era of ubiquitous genetic information for research, clinical care and personal curiosity. Sharing these data sets is vital for progress in biomedical research. However, a growing concern is the ability to protect the genetic privacy of the data originators. Here, we present an overview of genetic privacy breaching strategies. We outline(More)
Next-generation sequencing is limited to short read lengths and by high error rates. We systematically analyzed sources of noise in the Illumina Genome Analyzer that contribute to these high error rates and developed a base caller, Alta-Cyclic, that uses machine learning to compensate for noise factors. Alta-Cyclic substantially improved the number of(More)
Next-generation sequencers have sufficient power to analyze simultaneously DNAs from many different specimens, a practice known as multiplexing. Such schemes rely on the ability to associate each sequence read with the specimen from which it was derived. The current practice of appending molecular barcodes prior to pooling is practical for parallel analysis(More)
Sharing sequencing data sets without identifiers has become a common practice in genomics. Here, we report that surnames can be recovered from personal genomes by profiling short tandem repeats on the Y chromosome (Y-STRs) and querying recreational genetic genealogy databases. We show that a combination of a surname with other types of metadata, such as age(More)
Short tandem repeats (STRs) have a wide range of applications, including medical genetics, forensics, and genetic genealogy. High-throughput sequencing (HTS) has the potential to profile hundreds of thousands of STR loci. However, mainstream bioinformatics pipelines are inadequate for the task. These pipelines treat STR mapping as gapped alignment, which(More)
— Significant volumes of knowledge have been accumulated in recent years linking subtle genetic variations to a wide variety of medical disorders from Cystic Fibrosis to mental retardation. Nevertheless, there are still great challenges in applying this knowledge routinely in the clinic, largely due to the relatively tedious and expensive process of DNA(More)
Fulfilling the promise of the genetic revolution requires the analysis of large datasets containing information from thousands to millions of participants. However, sharing human genomic data requires protecting subjects from potential harm. Current models rely on de-identification techniques in which privacy versus data utility becomes a zero-sum game.(More)
MOTIVATION Despite the rapid decline in sequencing costs, sequencing large cohorts of individuals is still prohibitively expensive. Recently, several sophisticated pooling designs were suggested that can identify carriers of rare alleles in large cohorts with a significantly smaller number of pools, thus dramatically reducing the cost of such large-scale(More)
Despite representing an important source of genetic variation, tandem repeats (TRs) remain poorly studied due to technical difficulties. We hypothesized that TRs can operate as expression (eQTLs) and methylation (mQTLs) quantitative trait loci. To test this we analyzed the effect of variation at 4849 promoter-associated TRs, genotyped in 120 individuals, on(More)