Determining the Number of Non-Spurious Arcs in a Learned DAG Model

Abstract

In many application areas where graphical models are used and where their structure is learned from data, the end goal is neither prediction nor density estimation. Rather, it is the uncovering of discrete relationships between entities. For example, in computational biology, one may be interested in discovering which proteins within a large set of proteins interact with one another. In these problems, relationships can be represented by arcs in a graphical model. Consequently, given a learned model, we are interested in knowing how many of the arcs are real or non-spurious. In our approach to this problem, we estimate and control the False Discovery Rate (FDR) [1] of a set of arc hypotheses. The FDR is defined as the (expected) proportion of all hypotheses (e.g., arc hypotheses) which we label as true, but which are actually false (i.e., the number of false positives divided by the number of total hypotheses called true). In our evaluations, we concentrate on directed acyclic graphs (DAGs) for discrete variables with known variable orderings, as our problem of interest (concerning a particular problem related to HIV vaccine design) has these properties. We use the term arc hypothesis to denote the event that an arc is present in the underlying distribution of the data. In a typical computation of FDR, we are given a set of hypotheses where each hypothesis, i, is assigned a score, si (traditionally, a test statistic, or the p-value resulting from such a test statistic). These scores are often assumed to be independent and identically distributed, although there has been much work to relax the assumption of independence [2]. The FDR is computed as a function of a threshold, t, on these scores, FDR = FDR(t). For threshold t, all hypotheses with si ≥ t are said to be significant (assuming, without loss of generality, that the higher a score, the more we believe a hypothesis). The FDR at threshold t is then given by FDR(t) = E [

Extracted Key Phrases

1 Figure or Table

Cite this paper

@inproceedings{Listgarten2008DeterminingTN, title={Determining the Number of Non-Spurious Arcs in a Learned DAG Model}, author={Jennifer Listgarten and David Heckerman}, year={2008} }