Alexander R. Statnikov

Learn More
MOTIVATION Cancer diagnosis is one of the most important emerging clinical applications of gene expression microarray technology. We are seeking to develop a computer system for powerful and reliable cancer diagnostic model creation based on microarray data. To keep a realistic perspective on clinical applications we focus on multicategory diagnosis. To(More)
This paper presents a number of new algorithms for discovering the Markov Blanket of a target variable T from training data. The Markov Blanket can be used for variable selection for classification, for causal discovery, and for Bayesian Network learning. We introduce a low-order polynomial algorithm and several variants that soundly induce the Markov(More)
We present an algorithmic framework for learning local causal structure around target variables of interest in the form of direct causes/effects and Markov blankets applicable to very large data sets with relatively small samples. The selected feature sets can be used for causal discovery and classification. The framework (Generalized Local Learning, or(More)
UNLABELLED We introduce a novel, sound, sample-efficient, and highly-scalable algorithm for variable selection for classification, regression and prediction called HITON. The algorithm works by inducing the Markov Blanket of the variable to be classified or predicted. A wide variety of biomedical tasks with different characteristics were used for an(More)
Data Mining with Bayesian Network learning has two important characteristics: under conditions learned edges between variables correspond to casual influences, and second, for every variable <i>T</i> in the network a special subset (Markov Blanket) identifiable by the network is the minimal variable set required to predict <i>T.</i> However, all known(More)
Causal Probabilistic Networks (CPNs), (a.k.a. Bayesian Networks, or Belief Networks) are well-established representations in biomedical applications such as decision support systems and predictive modeling or mining of causal hypotheses. CPNs (a) have well-developed theory for induction of causal relationships, and (b) are suitable for creating sound and(More)
In part I of this work we introduced and evaluated the Generalized Local Learning (GLL) framework for producing local causal and Markov blanket induction algorithms. In the present second part we analyze the behavior of GLL algorithms and provide extensions to the core methods. Specifically, we investigate the empirical convergence of GLL to the true local(More)
Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate classification algorithms available for microarray gene expression data is a critical ingredient in order to develop(More)
We organized for WCCI 2008 a challenge to evaluate causal modeling techniques, focusing on predicting the effect of “interventions” performed by an external agent. Examples of that problem are found in the medical domain to predict the effect of a drug prior to administering it, or in econometrics to predict the effect of a new policy prior to issuing it.(More)
Algorithms for Markov boundary discovery from data constitute an important recent development in machine learning, primarily because they offer a principled solution to the variable/feature selection problem and give insight on local causal structure. Over the last decade many sound algorithms have been proposed to identify a single Markov boundary of the(More)