Learn More
genetic factors underlying common disease are largely unknown. Discovery of disease-causing genes will transform our knowledge of the genetic contribution to human disease, lead to new genetic screens, and underpin research into new cures and improved lifestyles. The se-quencing of the human genome has catalyzed efforts to search for disease genes by the(More)
We describe and test a Markov chain model of microsatellite evolution that can explain the different distributions of microsatellite lengths across different organisms and repeat motifs. Two key features of this model are the dependence of mutation rates on microsatellite length and a mutation process that includes both strand slippage and point mutation(More)
We have developed a simple and efficient algorithm to identify each member of a large collection of DNA-linked objects through the use of hybridization, and have applied it to the manufacture of randomly assembled arrays of beads in wells. Once the algorithm has been used to determine the identity of each bead, the microarray can be used in a wide variety(More)
We fit a Markov chain model of microsatellite evolution introduced by Kruglyak et al. to data on all di-, tri-, and tetranucleotide repeats in the yeast genome. Our results suggest that many features of the distribution of abundance and length of microsatellites can be explained by this simple model, which incorporates a competition between slippage events(More)
SUMMARY An ultrafast DNA sequence aligner (Isaac Genome Alignment Software) that takes advantage of high-memory hardware (>48 GB) and variant caller (Isaac Variant Caller) have been developed. We demonstrate that our combined pipeline (Isaac) is four to five times faster than BWA + GATK on equivalent hardware, with comparable accuracy as measured by trio(More)
With the advent of next generation sequencing technologies, the cost of sequencing whole genomes is poised to go below $1000 per human individual in a few years. As more and more genomes are sequenced, analysis methods are undergoing rapid development, making it tempting to store sequencing data for long periods of time so that the data can be re-analyzed(More)
We describe Manta, a method to discover structural variants and indels from next generation sequencing data. Manta is optimized for rapid clinical analysis, calling structural variants, medium-sized indels and large insertions on standard compute hardware in less than a tenth of the time that comparable methods require to identify only subsets of these(More)
Many expression array experiments monitor gene activity as an organism goes through some biological process. It is desirable to find genes with similar expression patterns in the resulting time series data. We propose a new simulation approach that assesses the statistical significance of similarity scores between expression patterns. The simulation takes(More)
UNLABELLED : We describe Manta, a method to discover structural variants and indels from next generation sequencing data. Manta is optimized for rapid germline and somatic analysis, calling structural variants, medium-sized indels and large insertions on standard compute hardware in less than a tenth of the time that comparable methods require to identify(More)