• Publications
  • Influence
Fast and accurate short read alignment with Burrows–Wheeler transform
We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. Expand
The Sequence Alignment/Map format and SAMtools
Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads produced by different sequencing platforms. Expand
The variant call format and VCFtools
We propose the variant call format (VCF) as a standardized format for storing the most prevalent types of sequence variation, including SNPs, indels and larger structural variants, together with rich annotations. Expand
Fast and accurate long-read alignment with Burrows–Wheeler transform
We designed and implemented a new algorithm, Burrows-Wheeler Aligner's Smith-Waterman Alignment (BWA-SW), to align long sequences up to 1 Mb against a large sequence database with a few gigabytes of memory. Expand
A global reference for human genetic variation
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multipleExpand
The Pfam protein families database
Pfam is a database of curated protein families, each of which is defined by two alignments and a profile hidden Markov model (HMM). Expand
Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
Probablistic models are becoming increasingly important in analyzing the huge amount of data being produced by large-scale DNA-sequencing efforts such as the Human Genome Project. Expand
Initial sequencing and analysis of the human genome.
The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce andExpand
An integrated map of genetic variation from 1,092 human genomes
By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. HereExpand
A map of human genome variation from population-scale sequencing
The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we presentExpand