The variant call format and VCFtools

Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a… Expand
VCFdbR: A method for expressing biobank-scale Variant Call Format data in a SQLite database using R
This work proposes here a pipeline for converting VCFs to simple SQLite databases, which allow for rapid searching and filtering of genetic variants while minimizing memory overhead. Expand
VCF-kit: assorted utilities for the variant call format
VCF‐kit adds essential utilities to process and analyze VCF files, including primer generation for variant validation, dendrogram production, genotype imputation from sequence data in linkage studies, and additional tools. Expand
Genome analysis cyvcf 2 : fast , flexible variant analysis with Python
Motivation: Variant call format (VCF) files document the genetic variation observed after DNA sequencing, alignment and variant calling of a sample cohort. Given the complexity of the VCF format asExpand
Vcflib and tools for processing the VCF variant call format
Over 125 useful and much used free and open source software tools and libraries, part of vcflib tools and bio-vcf, used in biomedical sequencing workflows around the world today are presented. Expand
Improved VCF normalization for accurate VCF comparison
A VCF normalization method called Best Alignment Normalisation (BAN) is introduced that results in more accurate VCF file comparison and is defined as the one resulting in less disagreement between the outputs of different VCF comparators. Expand
cyvcf2: fast, flexible variant analysis with Python
This work introduces cyvcf2, a Python library and software package for fast parsing and querying of VCF and BCF files and illustrates its speed, simplicity and utility. Expand
Unified representation of genetic variants
A software tool vt normalize is presented that normalizes representation of genetic variants in the VCF and demonstrates the inconsistent representation of variants across existing sequence analysis tools and shows that the tool facilitates integration of diverse variant types and call sets. Expand
Variant Tool Chest: an improved tool to analyze and manipulate variant call format (VCF) files
New software called the Variant Tool Chest (VTC) is developed to provide much needed tools to work with VCF files and provides new and important functionality that complements and integrates well with existing software. Expand
Improved VCF Normalization for Evaluation of DNA Variant Calling Algorithms
Variant Call Format (VCF) is widely used to store data about genetic variations. Applications include evaluation of variant calling workflows and the study of the similarity of individual variations,Expand
vcfr: a package to manipulate and visualize variant call format data in R
The r package vcfr provides essential, novel tools currently not available in r to facilitate VCF data exploration, including intuitive methods for data quality control and easy export to other r packages for further analysis. Expand


The Sequence Alignment/Map format and SAMtools
Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced byExpand
A standard variation file format for human genome sequences
The Genome Variation Format (GVF), an extension of Generic Feature Format version 3 (GFF3), is a simple tab-delimited format for DNA variant files, which uses Sequence Ontology to describe genome variation data. Expand
The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.
The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas. Expand
A map of human genome variation from population-scale sequencing
The pilot phase of the 1000 Genomes Project is presented, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms, and the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants are described. Expand
