Data Set Used
To evaluate and compare the performance of variant calling methods and their confidence scores, comparisons between a test call set and a " gold standard " need to be carried out. Unfortunately , these comparisons are not straightforward with the current Variant Call Files (VCF), which are the standard output of most variant calling algorithms for… (More)
A technique to implement error detection as part of the arithmetic coding process is described. Heuristic arguments are given to show that a small amount of extra redundancy can be very effective in detecting errors very quickly, and practical tests confirm this prediction.
The analysis of whole-genome or exome sequencing data from trios and pedigrees has been successfully applied to the identification of disease-causing mutations. However, most methods used to identify and genotype genetic variants from next-generation sequencing data ignore the relationships between samples, resulting in significant Mendelian errors, false… (More)
SureChEMBL is a publicly available large-scale resource containing compounds extracted from the full text, images and attachments of patent documents. The data are extracted from the patent literature according to an automated text and image-mining pipeline on a daily basis. SureChEMBL provides access to a previously unavailable, open and timely set of… (More)
reversible lossless text transform to improve compression performance " , LIPT: A lossless text transform to improve compression " .