Learn More
BACKGROUND Like microarray-based investigations, high-throughput proteomics techniques require machine learning algorithms to identify biomarkers that are informative for biological classification problems. Feature selection and classification algorithms need to be robust to noise and outliers in the data. RESULTS We developed a recursive support vector(More)
mSin3A is a core component of a large multiprotein corepressor complex with associated histone deacetylase (HDAC) enzymatic activity. Physical interactions of mSin3A with many sequence-specific transcription factors has linked the mSin3A corepressor complex to the regulation of diverse signaling pathways and associated biological processes. To dissect the(More)
Complex interactions between genes or proteins contribute substantially to phenotypic evolution. We present a probabilistic model and a maximum likelihood approach for cross-species clustering analysis and for identification of conserved as well as species-specific co-expression modules. This model enables a "soft" cross-species clustering (SCSC) approach(More)
BACKGROUND Many statistical algorithms combine microarray expression data and genome sequence data to identify transcription factor binding motifs in the low eukaryotic genomes. Finding cis-regulatory elements in higher eukaryote genomes, however, remains a challenge, as searching in the promoter regions of genes with similar expression patterns often(More)
Learning the structure of Bayesian networks(BNs) is known to be NP-complete and most of the recent work in the field is based on heuristics. Many recent approaches to the problem trade correctness and exactness for faster computation and are still computationally infeasible, except for networks with few variables. In this paper we present a(More)
UNLABELLED Structural variations (SVs) are large genomic rearrangements that vary significantly in size, making them challenging to detect with the relatively short reads from next-generation sequencing (NGS). Different SV detection methods have been developed; however, each is limited to specific kinds of SVs with varying accuracy and resolution. Previous(More)
ParaLearn is a scalable, parallel FPGA-based system for learning interaction networks using Bayesian statistics. Par-aLearn includes problem specific parallel/scalable algorithms, system software and hardware architecture to address this complex problem. Learning interaction networks from data uncovers causal relationships and allows scientists to predict(More)
BACKGROUND Collecting and managing information is a challenging task in a genome-wide profiling research project. Most databases and online computational tools require a direct human involvement. Information and computational results are presented in various multimedia formats (e.g., text, image, PDF, word files, etc.), many of which cannot be automatically(More)
SUMMARY VarSim is a framework for assessing alignment and variant calling accuracy in high-throughput genome sequencing through simulation or real data. In contrast to simulating a random mutation spectrum, it synthesizes diploid genomes with germline and somatic mutations based on a realistic model. This model leverages information such as previously(More)