Learn More
MOTIVATION An important aspect of infectious disease research involves understanding the differences and commonalities in the infection mechanisms underlying various diseases. Systems biology-based approaches study infectious diseases by analyzing the interactions between the host species and the pathogen organisms. This work aims to combine the knowledge(More)
We consider the problem of building a predictive model for host-pathogen protein interactions, when there are no known interactions available. Our goal is to predict the protein protein interactions (PPIs) between the plant host Arabidopsis thaliana and the bacterial species Salmonella typhimurium. Our method based on transfer learning, utilizes labeled(More)
We explore a transfer learning setting, in which a finite sequence of target concepts are sampled independently with an unknown distribution from a known family. We study the total number of labeled examples required to learn all targets to an arbitrary specified expected accuracy, focusing on the asymptotics in the number of tasks and the desired accuracy.(More)
BACKGROUND Human immunodeficiency virus-1 (HIV-1) has a minimal genome of only 9 genes, which encode 15 proteins. HIV-1 thus depends on the human host for virtually every aspect of its life cycle. The universal language of communication in biological systems, including between pathogen and host, is via signal transduction pathways. The fundamental units of(More)
MOTIVATION Approaches that use supervised machine learning techniques for protein-protein interaction (PPI) prediction typically use features obtained by integrating several sources of data. Often certain attributes of the data are not available, resulting in missing values. In particular, our host-pathogen PPI datasets have a large fraction, in the range(More)
Given an author-paper-conference graph, how can we automatically find groups for author, paper and conference respectively. Existing work either (1) requires fine tuning of several parameters, or (2) can only be applied to bipar-tite graphs (e.g., author-paper graph, or paper-conference graph). To address this problem, in this paper, we propose PaCK for(More)
Proactive Learning is a generalized form of active learning with multiple oracles exhibiting different reliabilities (label noise) and costs. We propose a general approach for Proactive Learning that explicitly addresses the cost vs. reliability tradeoff for oracle and instance selection. We formulate the problem in the PAC learning framework with bounded(More)
Rare categories become more and more abundant and their characterization has received little attention thus far. Fraudulent banking transactions, network intrusions, and rare diseases are examples of rare classes whose detection and characterization are of high value. However, accurate characterization is challenging due to high-skewness and nonseparability(More)
We consider the problem of building a model to predict protein-protein interactions (PPIs) between the bacterial species Salmonella Typhimurium and the plant host Arabidopsis thaliana which is a host-pathogen pair for which no known PPIs are available. To achieve this, we present approaches, which use homology and statistical learning methods called(More)