• Publications
  • Influence
Variance component model to account for sample structure in genome-wide association studies
A variance component approach implemented in publicly available software, EMMA eXpedited (EMMAX), that reduces the computational time for analyzing large GWAS data sets from years to hours is reported. Expand
Assessing computational tools for the discovery of transcription factor binding sites
The purpose of the current assessment is to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools. Expand
Efficient Control of Population Structure in Model Organism Association Mapping
A new method, efficient mixed-model association (EMMA), which corrects for population structure and genetic relatedness in model organism association mapping and takes advantage of the specific nature of the optimization problem in applying mixed models for association mapping, which allows for substantially increase the computational speed and reliability of the results. Expand
Mouse genomic variation and its effect on phenotypes and gene regulation
These sequences provide a starting point for a new era in the functional analysis of a key model organism and show that the molecular nature of functional variants and their position relative to genes vary according to the effect size of the locus. Expand
The Spectrum Kernel: A String Kernel for SVM Protein Classification
A new sequence-similarity kernel, the spectrum kernel, is introduced for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem and performs well in comparison with state-of-the-art methods for homology detection. Expand
Whole-Genome Patterns of Common DNA Variation in Three Human Populations
This work has characterized whole-genome patterns of common human DNA variation by genotyping 1,586,383 single-nucleotide polymorphisms (SNPs) in 71 Americans of European, African, and Asian ancestry and indicates that these SNPs capture most common genetic variation as a result of linkage disequilibrium. Expand
Data mining methods for detection of new malicious executables
This work presents a data mining framework that detects new, previously unseen malicious executables accurately and automatically and more than doubles the current detection rates for new malicious executable. Expand
A sequence-based variation map of 8.27 million SNPs in inbred mouse strains
A dense map of genetic variation in the laboratory mouse genome will provide insights into the evolutionary history of the species and lead to an improved understanding of the relationship betweenExpand
Mismatch String Kernels for SVM Protein Classification
A class of string kernels, called mismatch kernels, are introduced for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem, and show that the mismatch kernel used with an SVM classifier performs as well as the Fisher kernel, the most successful method for remote homology detection. Expand