Genome-wide profiling of heritable and de novo STR variations

  title={Genome-wide profiling of heritable and de novo STR variations},
  author={Thomas Willems and Dina Zielinski and Jie Yuan and Assaf Gordon and Melissa Gymrek and Yaniv Erlich},
  journal={Nature methods},
  pages={590 - 592}
Short tandem repeats (STRs) are highly variable elements that play a pivotal role in multiple genetic diseases, population genetics applications, and forensic casework. However, it has proven problematic to genotype STRs from high-throughput sequencing data. Here, we describe HipSTR, a novel haplotype-based method for robustly genotyping and phasing STRs from Illumina sequencing data, and we report a genome-wide analysis and validation of de novo STR mutations. HipSTR is freely available at… 

A genomic view of short tandem repeats.

  • M. Gymrek
  • Biology
    Current opinion in genetics & development
  • 2017

Population-level genome-wide STR discovery and validation for population structure and genetic diversity assessment of Plasmodium species

Genotyped STRs using HipSTR in more than 3,000 Plasmodium falciparum and 174 PlasModium vivax published whole-genome sequence data from samples collected across the globe and developed a novel method for quality control of STR genotype calls.

Population-level genome-wide STR typing in Plasmodium species reveals higher resolution population structure and genetic diversity relative to SNP typing

The identification of highly informative STR markers from large numbers of population samples is a powerful approach to study the genetic diversity, population structures and genomic signatures of selection in P. falciparum and P. vivax and a multivariable logistic regression model for the measurement and prediction of the quality of STRs is developed.

Interpreting short tandem repeat variations in humans using mutational constraint

This work harnessed bioinformatics tools and a novel analytical framework to estimate mutation parameters for each STR in the human genome by correlating STR genotypes with local sequence heterozygosity and used these estimates to create a framework for measuring constraint at STRs by comparing observed versus expected mutation rates.

A reference haplotype panel for genome-wide imputation of short tandem repeats

A SNP+STR haplotype reference panel that allows imputation of STRs from SNP array data is provided that will enable the first large-scale STR association studies across a range of complex traits.

Polymorphic short tandem repeats make widespread contributions to blood and serum traits

It is suggested that polymorphic tandem repeats make widespread contributions to complex traits, provides a set of stringently selected candidate causal STRs, and demonstrates the need to routinely consider a more complete view of human genetic variation in GWAS.

A comparison of software for analysis of rare and common short tandem repeat (STR) variation using human genome sequences from clinical and population-based samples

Analysis on shared genomic sequence data provided by the GIAB consortium and 1000 Genomes Project sequencing data allows future performance comparisons of new software approaches on a common set of data, facilitating comparisons and allowing researchers to choose the best software that fulfils their needs.

Open-Access STRS Database Of Populations From The 1000 Genomes Project Using High Coverage Phase 3 Data

This set of analyses revealed that except for larger Penta D and Penta E alleles, allele frequencies and genotypes defined by HipSTR from the 1000 Genomes Project phase 3 data and offered as an open-access database are consistent and highly reliable.

Genetics and Molecular Biology

The main applications of MPS in forensic genetics are discussed, which include the study of metagenomics, which analyzes genetic material from a microbial community to obtain information about individual identification, post-mortem interval estimation, geolocation inference, and substrate analysis.

Open-Access Worldwide Population STR Database Constructed Using High-Coverage Massively Parallel Sequencing Data Obtained from the 1000 Genomes Project

A comprehensive genotyping analysis of a set of STRs of broad forensic interest obtained from 1000 Genome populations revealed that allele frequencies and genotypes offered as an open-access database are consistent and reliable.



Accurate typing of short tandem repeats from genome-wide sequencing data and its applications

STR-FM, short tandem repeat profiling using flank-based mapping, a computational pipeline that can detect the full spectrum of STR alleles from short-read data, can adapt to emerging read-mapping algorithms, and can be applied to heterogeneous genetic samples is developed.

Accurate human microsatellite genotypes from high-throughput resequencing data using informed error profiles

A tool for genotyping microsatellite repeats called RepeatSeq is presented, which uses Bayesian model selection guided by an empirically derived error model that incorporates sequence and read properties.

The landscape of human STR variation

This report reports the largest-scale analysis of human STR variation to date, collecting information for nearly 700,000 STR loci across over 1,000 individuals in phase 1 of the 1000 Genomes Project and utilizing this call set to analyze determinants of STR variation, assess the human reference genome’s representation of STR alleles, and obtain initial estimates of the linkage disequilibrium between STRs and common SNPs.

lobSTR: A short tandem repeat profiler for personal genomes.

The speed and reliability of lobSTR exceed the performance of current mainstream algorithms for STR profiling, and the algorithm was used to conduct a comprehensive survey of STR variations in a deeply sequenced personal genome.

Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications

The performance of Platypus is demonstrated by comparing with SAMtools and GATK on whole-genome and exome-capture data, by identifying de novo variation in 15 parent-offspring trios with high sensitivity and specificity, and by estimating human leukocyte antigen genotypes directly from variant calls.

Genome-wide patterns and properties of de novo mutations in humans

It is shown that de novo mutations in the offspring of older fathers are not only more numerous but also occur more frequently in early-replicating, genic regions, providing a genome-wide mutation rate map for medical and population genetics applications.

Abundant contribution of short tandem repeats to gene expression variation in humans

A genome-wide survey of the contribution of short tandem repeats (STRs), which constitute one of the most polymorphic and abundant repeat classes, to gene expression in humans found that eSTRs are enriched in various clinically relevant conditions and may modulate certain histone modifications.

A global reference for human genetic variation

  • Taras K. OleksykAdam Gonçalo R. David M. Richard M. Gonçalo R. David R. Auton Abecasis Altshuler Durbin Abecasis Bentley C Shane A. McCarthy
  • Biology
  • 2015
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

Comprehensive variation discovery in single human genomes

It is shown that the combination of new methods and improved data increases sensitivity by several fold, with the greatest impact in challenging regions of the human genome.